dcmeet second 07-07-11
DESCRIPTION
vTRANSCRIPT
1
Presented ByK.Indira
Under the Guidance ofDr. S. Kanmani,Professor, Department of Information Technology,Pondicherry Engineering College.
Mining Association Rules using Optimal Genetic Algorithm &
Quantum Swarm intelligent PSO.
2
Objective.Introduction.
Data Mining.Association Analysis.Limitations of the existing system.GA and PSO – An Introduction.
Existing Work.Based on GA.Based on PSO.
Work Done So far.
Proposed Work.
Papers Published.
References.
Contents
Execution Plan.
3
To Propose an efficient methodology for mining of ARs using Optimal Genetic Algorithm & Quantum Swarm intelligent PSO
Objective
4
Extraction of interesting information or patterns from data in large databases is known as data mining.
Data Mining
5
• Association analysis is the discovery of what
are
commonly called association rules.
• It studies the frequency of items occurring
together in
transactional databases
• Association rule mining provides valuable
information in assessing significant
correlations.
ASSOCIATION ANALYSIS
6
Association Rules
Find all the rules X Y with minimum support and confidence Support, s, probability
that a transaction contains X Y
Confidence, c, conditional probability that a transaction having X also contains Y
Let minsup = 50%, minconf = 50%Freq. Pat.: Milk:3, Nuts:3, Sugar:4,
Eggs:3, {Milk, Sugar}:3
Customerbuys sugar
Customerbuys both
Customerbuys milk
Nuts, Eggs, Bread40Nuts, Coffee, Sugar , Eggs,
Bread50
Milk, Sugar, Eggs30
Milk, Coffee, Sugar20
Milk, Nuts, Sugar10
Items boughtTid
Association rules: Milk Sugar (60%, 100%) Sugar Milk (60%, 75%)
7
• Apriori, FP Growth Tree, Éclat are some of
the popular algorithms for mining ARs.
• Traverse the database many times.
• I/O overhead, and computational complexity
is more
• Cannot meet the requirements of large-
scale database mining.
Limitations of Existing System
GA and PSO – An Introduction
• Evolutionary algorithms provide robust and efficient approach in exploring large search space.
• A Genetic Algorithm (GA) is a procedure used to find approximate solutions to search problems through the application of the principles of evolutionary biology.
• PSOs mechanism is inspired by the social and cooperative behavior displayed by various species like birds, fish etc including human beings.
8
9
Existing WorkMining ARs Based on Genetic Algorithm
• Efficient Distributed Genetic Algorithm done by spatial partitioning of the population into several semi-isolated nodes, each evolving in parallel and possibly exploring different regions of the search space.
• Genetic algorithm without taking the minimum support and confidence into account. Extracts the best rules that have best correlation between support and confidence
• Improved niched Pareto genetic algorithm(INPGA), selects the accurate candidates and also saves selection time with combining BNPGA and SDNPGA
• GRA with a new operator, called guided mutation is introduced. GRA considers the correlation coefficient between nodes in each individual of GRA.
10
Mining ARs Based on Particle Swarm Optimization
Existing Work contd..
• A novel algorithm for association rule mining in order to improve computational efficiency as well as to automatically determine suitable threshold values.
• The algorithm operates at three evolution levels where an adaptive inertia weight is presented. The safety distance is introduced to move the particle through its current position, and the proximity index.
• Self-adaptive method to adjust the inertia weight of the velocity update rule based on the empirical values and negative feedback technique is introduced ,which relieve the burden of specifying the parameters values.
• Combines Particle Swarm Optimization (PSO) and Genetic Algorithms (GAs) using fuzzy logic to integrate the results of both methods and for parameters tuning. The new optimization method combines the advantages of PSO and GA to give us an improved FPSO + FGA hybrid approach.
11
Work Done so Far
• Association Rule Mining was carried out using the Genetic Algorithm in Matlab 2008a.
• Mining of Association rule was carried out
using self Adaptive Genetic algorithm using Java.
• The GA Parameters were varied and the results were recorded for each cases.
12
Mining ARs using GA in Matlab 2008a.
Methodology
Selection : Tournament
Crossover Probability : Fixed ( Tested with 3 values)
Mutation Probability : No Mutation
Fitness Function :
Dataset : Lenses, Iris, Haberman from UCI Irvine repository.
Population : Fixed ( Tested with 3 values)
13
Flow chart of the GA
Results Analysis
No. of Instances No. of Instances * 1.25
No. of Instances *1.5
Accuracy %
No. of Generatio
ns
Accuracy %
No. of Generatio
ns
Accuracy %
No. of Generation
s
Lenses 75 7 82 12 95 17Haberman
71 114 68 88 64 70
Iris 77 88 87 53 82 45
Comparison based on variation in population Size.
Minimum Support & Minimum ConfidenceSup = 0.4 &
con =0.4Sup =0.9 & con
=0.9Sup = 0.9 & con
= 0.2Sup = 0.2 & con
= 0.9
Accuracy %
No. of Gen
Accura
cy %No. of Gen.
Accura
cy %No. of Gen.
Accura
cy %No. of
Gen
Lenses 22 20 49 11 70 21 95 18Haberman
45 68 58 83 71 90 62 75
Iris 40 28 59 37 78 48 87 55
Comparison based on variation in Minimum Support and Confidence
15
Cross OverPc = .25 Pc = .5 Pc = .75
Accuracy %
No. of Generatio
ns
Accuracy %
No. of Generatio
ns
Accuracy %
No. of Generatio
ns
Lenses 95 8 95 16 95 13Haberman
69 77 71 83 70 80
Iris 84 45 86 51 87 55
Dataset No. of Instances
No. of attributes
Populati
on SizeMinimum
Support
Minimum confidence
Crossov
er rateAccuracy in %
Lenses 24 4 36 0.2 0.9 0.25 95Haberman
306 3 306 0.9 0.2 0.5 71
Iris 150 5 225 0.2 0.9 0.75 87
Comparison of the optimum value of Parameters for maximum Accuracy
achieved
Comparison based on variation in Crossover Probability
16
• Values of minimum support, minimum
confidence and population size decides upon
the accuracy of the system than other GA
parameters.
• Crossover rate affects the convergence rate
rather than the accuracy of the system.
• The optimum value of the GA parameters varies
from data to data and the fitness function plays
a major role in optimizing the results.
• The size of the dataset and relationship between
attributes in data contributes to the setting up
of the parameters.
Inferences
17
Mining ARs using Self Adaptive GA in Java.
Methodology
Selection : Roulette Wheel
Crossover Probability : Fixed ( Tested with 3 values)
Mutation Probability : Self Adaptive
Fitness Function :
Dataset : Lenses, Iris, Car from UCI Irvine repository.
Population : Fixed ( Tested with 3 values)
18
Procedure SAGA
BeginInitialize population p(k);Define the crossover and mutation rate;
Do{
Do{Calculate support of all k rules;Calculate confidence of all k rules;Obtain fitness;Select individuals for crossover / mutation;
Calculate the average fitness of the n and (n-1) the generation;Calculate the maximum fitness of the n and (n-1) the generation;Based on the fitness of the selected item, calculate the new crossover and mutation rate;Choose the operation to be performed;} k times;}
Self Adaptive GA
SELF ADAPTIVE
20
Dataset Traditional GA Self Adaptive GAAccuracy
No. of Generations
Accuracy
No. of Generations
Lenses 75 38 87.5 35Haberman 52 36 68 28
Car Evaluation
85 29 96 21
Dataset Traditional GA Self Adaptive GAAccurac
y No. of
Generations
Accuracy
No. of Generations
Lenses 50 35 87.5 35Haberman 36 38 68 28
Car Evaluation
74 36 96 21
ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE ACCORDING TO TERMINTAION OF SAGA
ACCURACY COMPARISON BETWEEN GA AND SAGA WHEN PARAMETERS ARE IDEAL FOR TRADITIONAL GA
Results Analysis
Inferences
Better accuracy.
Better convergence. Self Adaptive GA gives better accuracy
than Traditional GA.
22
Proposed Work
1. To implement a Distributive niched Pareto memetic Algorithm for Rule Mining.
2.To propose a association rule mining algorithm based on Chaotic PSO and swarm intelligence.
3.Propose a Particle swarm optimization rule mining methodology combined with quantum computing and quantum differential evolution
23
• Obtains the comparison set S from clustering based
samples.
• For any two candidates and comparison set S, if one
candidate is dominated and the other not, the
candidate non-dominated is selected, Exit.
• If two candidates (cd_1 and cd_2) compute the
number of samples in two niches, count1 and count2.
• If count1=0, cd_1 is selected and if count2=0, cd_2 is
selected, Exit.
• If count1-count2>delta or count2-count1>delta, then
selects cd_2 or cd_1, Exit..
• If abs(count1-count2)<delta, computing the standard
deviation of
two niches,sd1 and sd2.
• If sd1>sd2, cd_1 is selected, otherwise, cd_2 is
selected.
• Exit
Niched Pareto Selection Algorithm
24
Distributed Model
GA1subpopulatio
n
GA2subpopulation
GA3subpopulatio
n
GA4subpopulatio
n Full Dataset
Rules Generat
ed
Rules Generat
edRules
Generated
Rules Generat
ed
Concept
Description
25
Based onchaotic maps
Association Rule mining Algorithm based on Chaotic PSO and Swarm intelligence.
Swarm Intelligence
Concept
26
Execution Plan
July : Niched Pareto Sampling based Selection. Implementing µGA for Local intensity Search.
August : Distributed Methodology Implementation.
Preparing the Above work as a paper.September & : Particle Swarm Optimization basedOctober Rule Mining to be implemented.
November : Chaotic PSO & Swarm intelligence based PSO
for Mining ARs to be implemented.Documenting the same into paper.
December & : Study on Quantum computing and January differential Evolution concepts.
27
Papers Published
Paper titled “ Framework for Comparison of Association Rule Mining Using Genetic Algorithm” has been presented in the International Conference On Computers, Communication & Intelligence at VCET, 2010.
Paper titled “Mining Association Rules Using Genetic Algorithm: The role of Estimation Parameters” has been Selected for presentation in the International conference on advances in computing and communications ,2011. To be published in Springer LNCS (CCIS) series.
Paper titled ‘Rule Acquisition in Data Mining Using a Self Adaptive Genetic Algorithm ’ has been Selected for presentation in the First International conference on Computer Science and Information Technology (CCSEIT-2011) , To be published in Springer LNCS (CCIS) series.
28
References
Jing Li, Han Rui-feng, “A Self-Adaptive Genetic Algorithm Based On Real- Coded”, International Conference on Biomedical Engineering and computer Science , Page(s): 1 - 4 , 2010
Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, “Linkage Discovery through Data Mining”, IEEE Magazine on Computational Intelligence, Volume 5, February 2010.
Caises, Y., Leyva, E., Gonzalez, A., Perez, R., “An extension of the Genetic Iterative Approach for Learning Rule Subsets “, 4th International Workshop on Genetic and Evolutionary Fuzzy Systems, Page(s): 63 - 67 , 2010
Shangping Dai, Li Gao, Qiang Zhu, Changwu Zhu, “A Novel Genetic Algorithm Based on Image Databases for Mining Association Rules”, 6th IEEE/ACIS International Conference on Computer and Information Science, Page(s): 977 – 980, 2007
Peregrin, A., Rodriguez, M.A., “Efficient Distributed Genetic Algorithm for Rule Extraction”,. Eighth International Conference on Hybrid Intelligent Systems, HIS '08. Page(s): 531 – 536, 2008
29
Mansoori, E.G., Zolghadri, M.J., Katebi, S.D., “SGERD: A Steady-State Genetic Algorithm for Extracting Fuzzy Classification Rules From Data”, IEEE Transactions on Fuzzy Systems, Volume: 16 , Issue: 4 , Page(s): 1061 – 1071, 2008..
Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, “Genetic Algorithm Based on Evolution Strategy and the Application in Data Mining”, First International Workshop on Education Technology and Computer Science, ETCS '09, Volume: 1 , Page(s): 848 – 852, 2009
Hong Guo, Ya Zhou, “An Algorithm for Mining Association Rules Based on Improved Genetic Algorithm and its Application”, 3rd International Conference on Genetic and Evolutionary Computing, WGEC '09, Page(s): 117 – 120, 2009
Genxiang Zhang, Haishan Chen, “Immune Optimization Based Genetic Algorithm for Incremental Association Rules Mining”, International Conference on Artificial Intelligence and Computational Intelligence, AICI '09, Volume: 4, Page(s): 341 – 345, 2009
References Contd..
30
Maria J. Del Jesus, Jose A. Gamez, Pedro Gonzalez, Jose M. Puerta, On the Discovery of Association Rules by means of Evolutionary Algorithms, from Advanced Review of John Wiley & Sons , Inc. 2011
Junli Lu, Fan Yang, Momo Li, Lizhen Wang, Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic Algorithm, Third International Conference on Measuring Technology and Mechatronics Automation, 2011.
Hamid Reza Qodmanan, Mahdi Nasiri, Behrouz Minaei-Bidgoli, Multi Objective Association Rule Mining with Genetic Algorithm without specifying Minimum Support and Minimum Confidence, Expert Systems with Applications 38 (2011) 288–298.
Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient Distributed Genetic Algorithm for Rule Extraction, Applied Soft Computing 11 (2011) 733–743.
J.H. Ang, K.C. Tan , A.A. Mamun, An Evolutionary Memetic Algorithm for Rule Extraction, Expert Systems with Applications 37 (2010) 1302–1315.
References
31
R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle swarm optimization to association rule mining, Applied Soft Computing 11 (2011) 326–336.
Bilal Alatas , Erhan Akin, Multi-objective rule mining using a chaotic particle swarm optimization algorithm, Knowledge-Based Systems 22 (2009) 455–460.
Mourad Ykhlef, A Quantum Swarm Evolutionary Algorithm for mining association rules in large databases, Journal of King Saud University – Computer and Information Sciences (2011) 23, 1–6.
Haijun Su, Yupu Yang, Liang Zhao, Classification rule discovery with DE/QDE algorithm, Expert Systems with Applications 37 (2010) 1216–1222.
Jing Li, Han Rui-feng, “A Self-Adaptive Genetic Algorithm Based On Real- Coded”, International Conference on Biomedical Engineering and computer Science , Page(s): 1 - 4 , 2010
Chuan-Kang Ting, Wei-Ming Zeng, Tzu- Chieh Lin, “Linkage Discovery through Data Mining”, IEEE Magazine on Computational Intelligence, Volume 5, February 2010.
References Contd..
32
Caises, Y., Leyva, E., Gonzalez, A., Perez, R., “An extension of the Genetic Iterative Approach for Learning Rule Subsets “, 4th International Workshop on Genetic and Evolutionary Fuzzy Systems, Page(s): 63 - 67 , 2010
Xiaoyuan Zhu, Yongquan Yu, Xueyan Guo, “Genetic Algorithm Based on Evolution Strategy and the Application in Data Mining”, First International Workshop on Education Technology and Computer Science, ETCS '09, Volume: 1 , Page(s): 848 – 852, 2009
References Contd..
33
References• Miguel Rodriguez, Diego M. Escalante, Antonio Peregrin, Efficient Distributed Genetic Algorithm for Rule extraction, Applied Soft Computing 11 (2011) 733–743.
• Hamid Reza Qodmanan , Mahdi Nasiri, Behrouz Minaei-Bidgoli, Multi objective association rule mining with genetic algorithm without specifying minimum support and minimum confidence, Expert Systems with Applications 38 (2011) 288–298.
• Junli Lu, Fan Yang, Momo Li1, Lizhen Wang, Multi-objective Rule Discovery Using the Improved Niched Pareto Genetic Algorithm, 2011 Third International Conference on Measuring Technology and Mechatronics Automation.
• Yan Chen, Shingo Mabu, Kotaro Hirasawa, Genetic relation algorithm with guided mutation for the large-scale portfolio optimization, Expert Systems with Applications 38 (2011) 3353–3363.
34
References• R.J. Kuo, C.M. Chao, Y.T. Chiu, Application of particle
swarm optimization to association rule mining, Applied Soft Computing 11 (2011) 326–336
• Yamina Mohamed Ben Ali, Soft Adaptive Particle Swarm Algorithm for Large Scale Optimization, IEEE 2010.
• Feng Lu, Yanfeng Ge, LiQun Gao, Self-adaptive Particle Swarm Optimization Algorithm for Global Optimization, 2010 Sixth International Conference on Natural Computation (ICNC 2010)
• Fevrier Valdez, Patricia Melin, Oscar Castillo, An improved evolutionary method with fuzzy logic for combining Particle Swarm Optimization and Genetic Algorithms, Applied Soft Computing 11 (2011) 2625–2632
35
Thank You