b. verma sir tit application of ga

8/18/2019 b. Verma Sir Tit Application of Ga

1/30


2/30

"eural "et#or$ GA Based

%y&rid system•"eural "et#or$ can learn 'arious tas$s fromtraining eamples, classify and model non linearrelationships.

• GA s ha'e &een used to optimie parameters of""

•GA encode the parameters of "" as stringrepresenting chromosome

•GA "" technology ha'e the a&ility to locate theneigh&orhood of the optimal solution )uic$ly

• *arge amount of memory re)uired to handleand manipulate the chromosomes


3/30

Eample+ GA Based Bac$ propagation"et#or$s

"et#or$ ie + ---"um&er of /eights+ 0

Input to %idden *ayer +1

%idden *ayer to 2utput *ayer+13on'entional "" use ma$e use of gradientdescent learning to o&tain their #eight

In con'entional "" there is a pro&lem of

local minimaGA Although does not guarantee glo&aloptima &ut has &een found to o&tain

accepta&ly good solution.


4/30

3hromosome 4epresentation

5 /66,/6-, /-6, /--, V66,V6-, V-6, V--7

Each Gene is a real 'alue coded in decimaldigits.

/e are considering #eights up to three decimalplaces so the num&er of digits re)uired is 1.

2ne digit is re)uired for sign ( 89:!

E

501;-6 1 ;1->< ?0>-;11; ?-;6; 7

@itness @unction @ 69 4E


5/30

Need of Data Mining

Traditional statistical techniques and data management tools are no longer

adequate for analyzing this vast collection of data being generated.

Examples Domains

Financial Investment !toc" indexes and prices# interest rates# credit carddata# fraud detection

Health Care !everal diagnostic information stored by hospital

management systems

Manufacturing and Production $rocess optimization and trouble shooting

Telecommunication network %alling patterns and fault managementsystems

Scientific Domain &stronomical observations '()# genomic data# biological

data.

The *orld *ide *eb


6/30

"no+ledge discovery in databases, -DD

• The term -DD refers to the overall process of "no+ledge

discovery in databases. Data mining is a particular step in

this process# involving the application of specific algorithms

for extracting patterns models/ from data.

• The additional steps in the -DD process# such as data

preparation# data selection# data cleaning# data 0ntegration

and proper interpretation of the results of mining# ensures

that useful "no+ledge is derived from the data.


7/30

The %ommon 1unctions of Data Mining

• Classification classifies a data item into one of several predefined

categorical classes.

• Regression maps a data item to a real valued prediction variable.

• Clustering maps a data item into one of several clusters# +here

clusters are natural groupings of data items based on similarity

metrics or probability densitymodels.

• Rule generation extracts classification rules from the data.


8/30

The %ommon 1unctions of Data Mining

• Discovering association rules describes association

relationship among different attributes.

• Summarization provides a compact description for a subset

of data.

• Deendenc! modeling describes significant dependencies

among variables.

• Se"uence anal!sis models sequential patterns# li"e time,series analysis. The goal is to model the states of the process

generating the sequence or to extract and report deviation

and trends over time


9/30

%hallenges of Data Mining

• Massive data sets and high dimensionality . 2uge data sets create

combinatorial explosive search space and increase the chances that a datamining algorithm +ill find spurious patterns that are not generally valid.

$ossible solutions include robust and efficient algorithms# sampling

approximation methods and parallel processing.

• User interaction and prior knowledge. Data mining is inherently an

interactive and iterative process. 3sers interaction is required at various

stages# and domain "no+ledge may be used either in the form of a high,level

specification of the model# or at a more detailed level.

• Over fitting and assessing the statistical significance. Data sets used for

mining are usually huge and available from distributed sources. &s a result#often the presence of spurious data points leads to over fitting of the models.

4egularization and re,sampling methodologies need to be emphasized for

model design.


10/30

%hallenges of Data Mining

• Understandability of patterns. It is necessary to ma$e thedisco'eries more understanda&le to humans.

Possi&le solutions include rule structuring, natural languagerepresentation, and the 'isualiation of data and $no#ledge.

• Nonstandard and incomplete data. The data can &e missingand9or noisy.

• Mixed media data. *earning from data that is represented &y a

com&ination of 'arious media, li$e numeric, sym&olic, images andtet.

• Management of changing data and knowledge. 4apidlychanging data, in a data&ase that is modiCed9deleted9augmented,may ma$e pre'iously disco'ered patterns in'alid.

Possi&le solutions include incremental methods for updating thepatterns.

• Integration. Data mining tools are often only a part of the entiredecision ma$ing system. It is desira&le that they integratesmoothly, &oth #ith the data&ase and the Cnal decision ma$ingprocedure.


11/30

GA for 3lassiCcation 4ule Disco'ery

• Michigan aroach# population consists of individuals 5chromosomes6

+here each individual encodes a single prediction rule.

• Pitts$urgh aroach# each individual encodes a set of prediction rules

• Pluses and minuses#

, The $ittsburgh approach directly ta"es into account rule interaction +hen

computing the fitness function of an individual.

, This approach leads to syntactically longer individuals.

, 0n the Michigan approach the individuals are simpler and syntactically

shorter.

, 0t simplifies the design of genetic operators.

• Ta"e the rule 01 cond78 &ND cond79 &ND :cnd7n:.T2EN class; ci/

< 4epresentation of the rule antecedent

< 4epresentation of rule consequent the T2EN part/


12/30

The 4ule Antecedent(sing GA!

• %ften there is a con&unction of conditions'

• (suall! use $inar! encoding'

• ) given attri$ute can take on k discrete values' *ncoding

can consist of k $its'+ for ,on- or ,off-.'

< / / 0 0 0 / 0 0 / /111/

• )ll $its can $e turned into ,0- 2s in order to ,turn off- this

condition'

• 3on4$inar! encoding is ossi$le' 5aria$le4length

individuals will arise' Ma! have to modif! crossover to $ea$le to coe with varia$le4length individuals'


13/30

4epresenting the 4ule 3onse)uent(Predicted 3lass!

• Three wa!s of reresenting the redicted class' +the TH*3 art.

< 1irst# encode it in the genome of an individual possibly ma"ing it

sub=ect to evolution./

< !econd# associate all individuals of the population +ith the samepredicted class# +hich is never modified during the running of

the algorithm.

< Third# choose the predicted class most suitable for a rule a

deterministic +ay/ as soon as the corresponding rule antecedent

is formed. e.g. Maximize fitness./


14/30


15/30

• 6eneralizing7secializing crossover# @asic

idea of this special "ind of crossover is to

generalize or specialize a given rule# depending

on +hether it is currently over fitting or under

fitting the data.

• E

*ith the Michigan approach , +here each individual represents asingle rule , using a binary encoding. Then the generalizing ?

specializing crossover operators can be implemented as the logical

>4 and the logical &ND# respectively


16/30

@itness @unction for 4uleDisco'ery

• Let a rule be IF A THEN C ,

• The predicti'e performance of a rule can &e summaried &y

a - - matri, sometimes called a confusion matri

TP True Positi'es "um&er of eamples satisfying A and 3

@P @alse Positi'es "um&er of eamples satisfying A &utnot 3

@" @alse "egati'es "um&er of eamples not satisfying A&ut satisfying 3

T" True "egati'es "um&er of eamples not satisfying Anor 3


17/30

• 6iven $! + confusion matri8. )ctual Class

• ' C not C

• ' Predicted C TP FP

• ' Class not C F3 T3

• Calculate the confidence factor

CF 9 TP 7 + TP : FP.• comleteness measure+ C%MP. 9 TP 7 +TP : F3.

• Fitness 9 CF ; C%MP 9 +TP.+TP. 7 +TP:FP.+TP:F3.

• Fitness 9 w0 < +CF ; C%MP. : w= < +Sim.

where Sim is a measure of rule simlicit! /> Sim>0 and

w0 and w= are user defined weights


18/30

GA for 3lustering

• 3rucial issue in the design of an GA for clustering isto decide #hat $ind of indi'idual representation #ill&e used to specify the clusters

, 3luster description:&ased representation+

• In this case each indi'idual eplicitly represents the parametersnecessary to precisely specify each cluster. "ature of parameterdepends on shape of cluster


19/30

3entroid 9 edoid :&ased representation

• In this case each indi'idual represents the coordinates of eachclusters centroid or medoid.

• A centroid is simply a point in the data space #hose coordinatesspecify the centre of the cluster.

• A medoid is the data instance #hich is nearest to the clusterscentroid.

• The position of the centroids 9 medoids and the procedure used toassign instances to clusters implicitly determine the precise shape andsie of the clusters


20/30

Instance:&asedrepresentation• In this case each indi'idual consists of a string of n

elements (genes!, #here n is the num&er of datainstances. Each gene i, i6,. . . ,n, represents the inde(id! of the cluster to #hich the i:th data instance is

assigned. %ence, each gene i can ta$e one out of F'alues, #here F is the num&er of clusters.

• Eample

suppose that n 6 and F ;. The indi'idual H- 6 - ;; - 6 6 - ; corresponds to a candidate clustering

#here the second, se'enth and eighth instances areassigned to cluster 6, the Crst, third, sith and ninthinstances are assigned to cluster - and the otherinstances are assigned to cluster ;.


21/30

3omparison of 'arious representations

• In &oth the centroid 9 medoid:&ased and the instance:&ased representation,clusters are mutually eclusi'e and ehausti'e

• The cluster descriptions may ha'e some o'erlapping so that an instancemay &e located #ithin t#o or more clusters.

• The instance:&ased representation has the disad'antage that it does notscale 'ery #ell for large data sets, since each indi'iduals length is directlyproportional to the num&er of instances &eing clustered.

• This representation also in'ol'es a considera&le degree of redundancy,#hich may lead to pro&lems in the application of con'entional geneticoperators. @or instance, let n = 4 and K = 2, and consider the

indi'iduals and . Tese t!" indi#iduals ha'e diJerentgene 'alues in all the four genes, &ut they represent the same candidateclustering solution, i.e., assigning the Crst and third instances to

one cluster and assigning the second and fourth instances to anothercluster. They crate 'ery diJerent results in crosso'er


22/30

@itness e'aluation for3lustering

• The Ctness of an indi'idual is ameasure of the )uality of the clusteringrepresented &y the indi'idual.

• Basic ideas of Ctness usually in'ol'ethe follo#ing principles

: maller the intra:cluster (#ithin

cluster! distance, the &etter the Ctness.: The larger the inter:cluster (&et#eencluster! distance, the &etter the Ctness.


23/30

Genetic Algorithms (GAs! for Pre:processing

• “The use of GAs for attribute selection seems natural. Themain reason is that the major source of dicult! inattribute selection is attribute interaction, and one of thestren"ths of Gas is that the! usuall! co#e $ell $ithattribute interactions.%

• The standard indi'idual representation for attri&ute selectionconsists simply of a string of " &its, #here " is the num&er oforiginal attri&utes and the i:th &it, i6,. . . ,", can ta$e the 'alue 6or , indicating #hether or not, respecti'ely, the i:th attri&ute isselected.

& This indi'idual representation is simple, and traditionalcrosso'er and mutation operators can &e easily applied.

: %o#e'er, it has the disad'antage that it does not scale 'ery

#ell #ith the num&er of attri&utes.


24/30

• &n alternative individual representation# +here each

individual represents a candidate attribute subset. &

candidate attribute subset can be represented as a string

+ith m binary genes +here m is the number of attributes

and each gene can ta"e on a 5A6 or 586.

• 1or instance# the individual B A C B A/# +here M = 5,

represents a candidate solution where only the Brd and

the Cth attributes are selected.

• >ne advantage of this representation is that it scales up

better +ith respect to a large number of original attributes

• 1ollo+ crossover and mutation procedures.


25/30

@itness @unction for Attri&uteelection

• GAs for attri&ute selection can &eroughly di'ided into t#o approaches

:/rapper approach+ the GA uses theclassiCcation algorithm to computethe Ctness of indi'iduals

: @ilter approach+ the GA does not

use the classiCcation algorithm


26/30

Genetic Algorithms (GAs! for Post:processing

• GAs can &e used in the post:processing step #hen there is anensem&le of classiCers (e.g. rule sets! created. Generating anensem&le of classiCers is a relati'ely recent trend in machinelearning #hen our primary goal is to maimie predicti'e accuracy

• Generating an ensem&le of classiCers is useful since it has &eensho#n that in se'eral cases an ensem&le of classiCers has a &etterpredicti'e accuracy than a single classiCer.

• A Ctness function may &e created using #eights for each classiCerin the ensem&le. (A user may help.! There are also GA schemes tooptimie the #eights of the classiCers.

• There is a ris$ of generating too many classiCers #hich end upo'er Ctting the training data hence pruning is some times used


27/30

4esearch Pro&lems

• 'isco(erin" sur#risin" rules)

Evolutionary algorithms seem to have a good potential todiscover truly surprising rules# due to their ability to cope

+ell +ith attribute interaction.

, &n interesting research direction is to design ne+surprisingness measures to evaluate the rules produced

by evolutionary algorithms


28/30

• *calin" u# E(olutionar!Al"orithms $ith +arallel

+rocessin")0n the context of mining very large databases# the vast

ma=ority of the processing time of an evolutionary algorithm

is spent on evaluating an individuals fitness

, Distributing the population individuals across the available

processors and computing their fitness in parallel. 2o+ever#

this strategy reduces scalability for large databases.

, 1itness of each individual is computed in parallel by all

processors. Data being mined is partitioned across the

processors


29/30

EA for FDD may &e applied to otherdomains

• -DD has a very interdisciplinary nature and uses many

different paradigms of "no+ledge discovery algorithms.

This motivates the integration of evolutionary algorithms

+ith other "no+ledge discovery paradigms

• -DD tas"s involve some "ind of prediction# +here

generalization performance on a separate test set is

much more important than the performance on a training

set. This principle may be applied to other domain as+ell


30/30

4eferences

• &lex &. 1reitas# 5& 4evie+ of Evolutionary &lgorithms for Data Mining6#

3niversity of -ent# 3-# %omputing aboratory

• !ushmita Mitra# !an"ar -. $al#and $abitra Mitra 5Data Mining in !oft

%omputing 1rame+or" & !urvey6 0EEE Transactions on Neural Net+or"s#

Fol. 8B# No 8# Ganuary 9AA9

• @ehrouz Minaei,@idgoli and *illiam 1. $unch# 53sing Henetic &lgorithms forData Mining >ptimization in an Educational *eb,@ased !ystem6# H&4&He#

Department of %omputer !cience I Engineering# Michigan !tate 3niversity#

http??garage.cse.msu.edu

• &lex &lves 1reitas# 5Evolutionary %omputation6# http??+++.ppgia.pucpr.,

br?Jalex

• !id @hattacharyya # 5Henetic &lgorithms 1or Data Mining6# +++.uic.edu?,

classes?idsc?idsKL9cna?H&DataMining%N&.pdf

• ###.site.uotta#a.ca9Knat93ourses9csi=;009...9LimMlides.ppt

b. verma sir tit application of ga

Documents