genetic algorithm clustering
TRANSCRIPT
-
8/10/2019 Genetic Algorithm Clustering
1/26
Basic concepts of Data Mining,
Clustering and Genetic Algorithms
Tsai-Yang Jea
Department of
Computer Science and EngineeringSUNY at Buffalo
-
8/10/2019 Genetic Algorithm Clustering
2/26
Data Mining Motivation
Mechanical production of data need for
mechanical consumption of data
Large databases = vast amounts of information
Difficulty lies in accessing it
-
8/10/2019 Genetic Algorithm Clustering
3/26
KDD and Data Mining
KDD:Extraction of knowledge from data
non-trivial extraction of implicit,previously unknown & potentially useful
knowledge from data
Data Mining:Discovery stage of the KDD
process
-
8/10/2019 Genetic Algorithm Clustering
4/26
Data Mining Techniques
Query tools
Statistical techniques
Visualization
On-line analyticalprocessing (OLAP)
Clustering
Classification
Decision trees
Association rules
Neural networks
Genetic algorithms
Any technique that helps to extract more out of data is useful
-
8/10/2019 Genetic Algorithm Clustering
5/26
Whats Clustering
Clustering is a kind of unsupervised learning.
Clustering is a method of grouping data that share
similar trend and patterns. Clustering of data is a method by which large sets
of data is grouped into clusters of smaller sets ofsimilar data. Example:
Thus, we see clustering means grouping of data or dividing a large
data set into smaller data sets of some similarity.
After clustering:
-
8/10/2019 Genetic Algorithm Clustering
6/26
The usage of clustering
Some engineering sciences such as pattern recognition, artificial
intelligence have been using the concepts of cluster analysis. Typical
examples to which clustering has been applied include handwritten
characters, samples of speech, fingerprints, and pictures. In the life sciences (biology, botany, zoology, entomology, cytology,
microbiology), the objects of analysis are life forms such as plants,
animals, and insects. The clustering analysis may range from
developing complete taxonomies to classification of the species into
subspecies. The subspecies can be further classified into subspecies. Clustering analysis is also widely used in information, policy and
decision sciences. The various applications of clustering analysis to
documents include votes on political issues, survey of markets, survey
of products, survey of sales programs, and R & D.
-
8/10/2019 Genetic Algorithm Clustering
7/26
A Clustering Example
Income: HighChildren:1
Car:Luxury
Income: Low
Children:0Car:Compact
Car: Sedan and
Children:3
Income: Medium
Income: Medium
Children:2
Car:Truck
Cluster 1
Cluster 2
Cluster 3
Cluster 4
-
8/10/2019 Genetic Algorithm Clustering
8/26
Different ways of representing
clusters
(b)
ad
k jh
gif
e
cb
ad
k jh
g if
e
cb
(a)
(c) 1 2 3
a
bc
0.4 0.1 0.5
0.1 0.8 0.10.3 0.3 0.4
...
(d)
g a c i e d k b j f h
-
8/10/2019 Genetic Algorithm Clustering
9/26
K Means Clustering
(Iterative distance-based clustering)
K means clustering is an effective algorithm
to extract a given numberof clusters ofpatterns from a training set. Once done, the
cluster locations can be used to classify
patterns into distinct classes.
-
8/10/2019 Genetic Algorithm Clustering
10/26
K means clustering
(Cont.)
Select the k cluster centers randomly.
Store the kcluster centers.
.clustertobelongingpatternth-theisand,
clusterinpatternstrainingofnumbertheismean,newtheiswhere
1
:clustertheofmeanthefindingbycenteritsrecomputecluster,eachFor
1
kjXk
NM
XNM
jk
kk
N
j
jk
k
k
k
.ofmemberaasclassifyandcenterclusternearestthe
findset,trainingin thepaterneachForset.trainingentireheClassify t
CXC
X
i
i
Loop until the
change in cluster
means is less the
amount specified
by the user.
-
8/10/2019 Genetic Algorithm Clustering
11/26
The drawbacks of K-means
clustering
The final clusters do not represent a global
optimization result but only the local one,
and complete different final clusters canarise from difference in the initial randomly
chosen cluster centers. (fig. 1)
We have to know how many clusters wewill have at the first.
-
8/10/2019 Genetic Algorithm Clustering
12/26
Drawback of K-means clustering
(Cont.)
Figure 1
-
8/10/2019 Genetic Algorithm Clustering
13/26
Clustering with Genetic
Algorithm
Introduction of Genetic Algorithm
Elements consisting GAs
Genetic Representation
Genetic operators
-
8/10/2019 Genetic Algorithm Clustering
14/26
Introduction of GAs
Inspired by biological evolution.
Many operators mimic the process of the
biological evolution including
Natural selection
CrossoverMutation
-
8/10/2019 Genetic Algorithm Clustering
15/26
Elements consisting GAs
Individual (chromosome):
feasible solution in an optimization
problem Population
Set of individuals
Should be maintained in each generation
-
8/10/2019 Genetic Algorithm Clustering
16/26
Elements consisting GAs
Genetic operators. (crossover, mutation)
Define the fitness function.
The fitness function takes a singlechromosome as input and returns ameasure of the goodness of the solutionrepresented by the chromosome.
-
8/10/2019 Genetic Algorithm Clustering
17/26
Genetic Representation
The most important starting point to develop a
genetic algorithm
Each gene has its special meaning Based on this representation, we can define
fitness evaluation function,
crossover operator,
mutation operator.
-
8/10/2019 Genetic Algorithm Clustering
18/26
Genetic Representation (Cont.)
Examples 1
Outlook
0
Wind
1
PlayTennis
1
Overcast
Rain
Sunny
1 1
Strong
Normal
Yes
No
0 0If Outlook isOvercast or RainandWind is Strong,thenPlayTennis = Yes
0 1 1 1 0 1 0
A chromosome
Gene
Allele value
-
8/10/2019 Genetic Algorithm Clustering
19/26
Genetic Representation (Cont.)
Examples 2 ( In clustering problem)
Each chromosome represents a set of clusters; each
gene represents an object; each allele value represents acluster. Genes with the same allele value are in the
same cluster.
1 2 1 4 3 5 5
A B C D E F G
-
8/10/2019 Genetic Algorithm Clustering
20/26
Crossover
Exchange featuresof two individuals to produce
two offspring (children)
Selected mates may have good properties tosurvive in next generations
So, we can expect that exchanging features may
produce other good individuals
-
8/10/2019 Genetic Algorithm Clustering
21/26
Crossover (cont.)
Single-point Crossover
Two-point Crossover
Uniform Crossover
1 1 01 1
0 0 00 1
0 0 1 0 0 0
0 1 0 1 0 1
1 1 01 1
0 0 00 1
0 1 0 1 0 1
0 0 1 0 0 0
1 1 01 1
0 0 00 1
0 0 1 0 0 0
0 1 0 1 0 1
1 1 00 1
0 0 01 1
0 1 1 0 0 0
0 0 0 1 0 1
1 0 10 1 0 1 0 0 1 1
1 1 01 1
0 0 00 1
0 0 1 0 0 0
0 1 0 1 0 1
1 0 00 1
0 1 01 1
0 0 0 1 0 0
0 1 1 0 0 1
Crossover template
-
8/10/2019 Genetic Algorithm Clustering
22/26
Mutation
Usually change a single bit in a bit string
This operator should happen with very lowprobability.0 1 01 1
0 1 11 1
Mutation point
(random)
-
8/10/2019 Genetic Algorithm Clustering
23/26
Typical Procedures
Crossover mates are probabilistically
selected based on theirfitnessvalue.
0 1 00 11 1 01 0
0 0 11 10 1 01 1
1 1 01 01 1 01 1
1 1 01 1
0 1 00 1
1 1 00 1
0 1 01 1
Crossover point
randomly selected
1 1 00 1
0 1 11 1
0 1 11 1
oldgeneration
newgeneration0 1 01 1
1 1 01 01 1 01 1
Mutation point
(random)
Probabilisticallyselect individuals
-
8/10/2019 Genetic Algorithm Clustering
24/26
Preparing the chromosomes
Defining genetic operators Fusion: takes two unique allele values and combines them into a
single allele value, combining two clusters into one.
Fission: takes a single allele value and gives it a different random
allele value, breaking a cluster apart.
Defining fitness functions
How to apply GA on a clustering
problem
1 2 33 5
1 2 33 5 3 2 33 5
1 3 33 5 1 3 44 5
-
8/10/2019 Genetic Algorithm Clustering
25/26
Example: (Cont.)
Crossover
Mutation
Fusion
Fission
Old generation
New generation
Select the chromosomes
according to the fitness
function.
1 2 33 51 2 43 5
2 1 33 52 2 43 5
1 1 13 5
2 2 32 51 2 53 5
2 4 33 42 2 43 5
2 1 23 5
-
8/10/2019 Genetic Algorithm Clustering
26/26
Finally