genetic algorithm clustering

Upload: harisxyz

Post on 02-Jun-2018

234 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Genetic Algorithm Clustering

    1/26

    Basic concepts of Data Mining,

    Clustering and Genetic Algorithms

    Tsai-Yang Jea

    Department of

    Computer Science and EngineeringSUNY at Buffalo

  • 8/10/2019 Genetic Algorithm Clustering

    2/26

    Data Mining Motivation

    Mechanical production of data need for

    mechanical consumption of data

    Large databases = vast amounts of information

    Difficulty lies in accessing it

  • 8/10/2019 Genetic Algorithm Clustering

    3/26

    KDD and Data Mining

    KDD:Extraction of knowledge from data

    non-trivial extraction of implicit,previously unknown & potentially useful

    knowledge from data

    Data Mining:Discovery stage of the KDD

    process

  • 8/10/2019 Genetic Algorithm Clustering

    4/26

    Data Mining Techniques

    Query tools

    Statistical techniques

    Visualization

    On-line analyticalprocessing (OLAP)

    Clustering

    Classification

    Decision trees

    Association rules

    Neural networks

    Genetic algorithms

    Any technique that helps to extract more out of data is useful

  • 8/10/2019 Genetic Algorithm Clustering

    5/26

    Whats Clustering

    Clustering is a kind of unsupervised learning.

    Clustering is a method of grouping data that share

    similar trend and patterns. Clustering of data is a method by which large sets

    of data is grouped into clusters of smaller sets ofsimilar data. Example:

    Thus, we see clustering means grouping of data or dividing a large

    data set into smaller data sets of some similarity.

    After clustering:

  • 8/10/2019 Genetic Algorithm Clustering

    6/26

    The usage of clustering

    Some engineering sciences such as pattern recognition, artificial

    intelligence have been using the concepts of cluster analysis. Typical

    examples to which clustering has been applied include handwritten

    characters, samples of speech, fingerprints, and pictures. In the life sciences (biology, botany, zoology, entomology, cytology,

    microbiology), the objects of analysis are life forms such as plants,

    animals, and insects. The clustering analysis may range from

    developing complete taxonomies to classification of the species into

    subspecies. The subspecies can be further classified into subspecies. Clustering analysis is also widely used in information, policy and

    decision sciences. The various applications of clustering analysis to

    documents include votes on political issues, survey of markets, survey

    of products, survey of sales programs, and R & D.

  • 8/10/2019 Genetic Algorithm Clustering

    7/26

    A Clustering Example

    Income: HighChildren:1

    Car:Luxury

    Income: Low

    Children:0Car:Compact

    Car: Sedan and

    Children:3

    Income: Medium

    Income: Medium

    Children:2

    Car:Truck

    Cluster 1

    Cluster 2

    Cluster 3

    Cluster 4

  • 8/10/2019 Genetic Algorithm Clustering

    8/26

    Different ways of representing

    clusters

    (b)

    ad

    k jh

    gif

    e

    cb

    ad

    k jh

    g if

    e

    cb

    (a)

    (c) 1 2 3

    a

    bc

    0.4 0.1 0.5

    0.1 0.8 0.10.3 0.3 0.4

    ...

    (d)

    g a c i e d k b j f h

  • 8/10/2019 Genetic Algorithm Clustering

    9/26

    K Means Clustering

    (Iterative distance-based clustering)

    K means clustering is an effective algorithm

    to extract a given numberof clusters ofpatterns from a training set. Once done, the

    cluster locations can be used to classify

    patterns into distinct classes.

  • 8/10/2019 Genetic Algorithm Clustering

    10/26

    K means clustering

    (Cont.)

    Select the k cluster centers randomly.

    Store the kcluster centers.

    .clustertobelongingpatternth-theisand,

    clusterinpatternstrainingofnumbertheismean,newtheiswhere

    1

    :clustertheofmeanthefindingbycenteritsrecomputecluster,eachFor

    1

    kjXk

    NM

    XNM

    jk

    kk

    N

    j

    jk

    k

    k

    k

    .ofmemberaasclassifyandcenterclusternearestthe

    findset,trainingin thepaterneachForset.trainingentireheClassify t

    CXC

    X

    i

    i

    Loop until the

    change in cluster

    means is less the

    amount specified

    by the user.

  • 8/10/2019 Genetic Algorithm Clustering

    11/26

    The drawbacks of K-means

    clustering

    The final clusters do not represent a global

    optimization result but only the local one,

    and complete different final clusters canarise from difference in the initial randomly

    chosen cluster centers. (fig. 1)

    We have to know how many clusters wewill have at the first.

  • 8/10/2019 Genetic Algorithm Clustering

    12/26

    Drawback of K-means clustering

    (Cont.)

    Figure 1

  • 8/10/2019 Genetic Algorithm Clustering

    13/26

    Clustering with Genetic

    Algorithm

    Introduction of Genetic Algorithm

    Elements consisting GAs

    Genetic Representation

    Genetic operators

  • 8/10/2019 Genetic Algorithm Clustering

    14/26

    Introduction of GAs

    Inspired by biological evolution.

    Many operators mimic the process of the

    biological evolution including

    Natural selection

    CrossoverMutation

  • 8/10/2019 Genetic Algorithm Clustering

    15/26

    Elements consisting GAs

    Individual (chromosome):

    feasible solution in an optimization

    problem Population

    Set of individuals

    Should be maintained in each generation

  • 8/10/2019 Genetic Algorithm Clustering

    16/26

    Elements consisting GAs

    Genetic operators. (crossover, mutation)

    Define the fitness function.

    The fitness function takes a singlechromosome as input and returns ameasure of the goodness of the solutionrepresented by the chromosome.

  • 8/10/2019 Genetic Algorithm Clustering

    17/26

    Genetic Representation

    The most important starting point to develop a

    genetic algorithm

    Each gene has its special meaning Based on this representation, we can define

    fitness evaluation function,

    crossover operator,

    mutation operator.

  • 8/10/2019 Genetic Algorithm Clustering

    18/26

    Genetic Representation (Cont.)

    Examples 1

    Outlook

    0

    Wind

    1

    PlayTennis

    1

    Overcast

    Rain

    Sunny

    1 1

    Strong

    Normal

    Yes

    No

    0 0If Outlook isOvercast or RainandWind is Strong,thenPlayTennis = Yes

    0 1 1 1 0 1 0

    A chromosome

    Gene

    Allele value

  • 8/10/2019 Genetic Algorithm Clustering

    19/26

    Genetic Representation (Cont.)

    Examples 2 ( In clustering problem)

    Each chromosome represents a set of clusters; each

    gene represents an object; each allele value represents acluster. Genes with the same allele value are in the

    same cluster.

    1 2 1 4 3 5 5

    A B C D E F G

  • 8/10/2019 Genetic Algorithm Clustering

    20/26

    Crossover

    Exchange featuresof two individuals to produce

    two offspring (children)

    Selected mates may have good properties tosurvive in next generations

    So, we can expect that exchanging features may

    produce other good individuals

  • 8/10/2019 Genetic Algorithm Clustering

    21/26

    Crossover (cont.)

    Single-point Crossover

    Two-point Crossover

    Uniform Crossover

    1 1 01 1

    0 0 00 1

    0 0 1 0 0 0

    0 1 0 1 0 1

    1 1 01 1

    0 0 00 1

    0 1 0 1 0 1

    0 0 1 0 0 0

    1 1 01 1

    0 0 00 1

    0 0 1 0 0 0

    0 1 0 1 0 1

    1 1 00 1

    0 0 01 1

    0 1 1 0 0 0

    0 0 0 1 0 1

    1 0 10 1 0 1 0 0 1 1

    1 1 01 1

    0 0 00 1

    0 0 1 0 0 0

    0 1 0 1 0 1

    1 0 00 1

    0 1 01 1

    0 0 0 1 0 0

    0 1 1 0 0 1

    Crossover template

  • 8/10/2019 Genetic Algorithm Clustering

    22/26

    Mutation

    Usually change a single bit in a bit string

    This operator should happen with very lowprobability.0 1 01 1

    0 1 11 1

    Mutation point

    (random)

  • 8/10/2019 Genetic Algorithm Clustering

    23/26

    Typical Procedures

    Crossover mates are probabilistically

    selected based on theirfitnessvalue.

    0 1 00 11 1 01 0

    0 0 11 10 1 01 1

    1 1 01 01 1 01 1

    1 1 01 1

    0 1 00 1

    1 1 00 1

    0 1 01 1

    Crossover point

    randomly selected

    1 1 00 1

    0 1 11 1

    0 1 11 1

    oldgeneration

    newgeneration0 1 01 1

    1 1 01 01 1 01 1

    Mutation point

    (random)

    Probabilisticallyselect individuals

  • 8/10/2019 Genetic Algorithm Clustering

    24/26

    Preparing the chromosomes

    Defining genetic operators Fusion: takes two unique allele values and combines them into a

    single allele value, combining two clusters into one.

    Fission: takes a single allele value and gives it a different random

    allele value, breaking a cluster apart.

    Defining fitness functions

    How to apply GA on a clustering

    problem

    1 2 33 5

    1 2 33 5 3 2 33 5

    1 3 33 5 1 3 44 5

  • 8/10/2019 Genetic Algorithm Clustering

    25/26

    Example: (Cont.)

    Crossover

    Mutation

    Fusion

    Fission

    Old generation

    New generation

    Select the chromosomes

    according to the fitness

    function.

    1 2 33 51 2 43 5

    2 1 33 52 2 43 5

    1 1 13 5

    2 2 32 51 2 53 5

    2 4 33 42 2 43 5

    2 1 23 5

  • 8/10/2019 Genetic Algorithm Clustering

    26/26

    Finally