genetic algorithm report

36
A SEMINAR REPORT ON GENETIC ALGORITHM Submitted to: Submitted by: Er. Richa Dutta Neh yadav(4108030) Lecturer Kamini (4108021) CSE-8 TH Sem 1

Upload: er-vijeta-narwal

Post on 30-Oct-2014

141 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Genetic Algorithm Report

A

SEMINAR REPORT

ON

GENETIC ALGORITHM

Submitted to: Submitted by:

Er. Richa Dutta Neh yadav(4108030)

Lecturer Kamini (4108021)

CSE-8TH Sem

YAMUNA INSTITUTE OF ENGINEERING AND TECHNOLOGY

GADHOLI

1

Page 2: Genetic Algorithm Report

Abstract

Genetic algorithms provide heuristic solutions for combinatorial-optimization problems

that have found applications in many areas with outstanding success. Genetic algorithms

is an optimization technique for searching very large spaces that models the role of the

genetic material in living organisms. A genetic algorithm (GA) is a search technique used

in computing to find exact or approximate solutions to optimization and search problems.

Genetic algorithms are categorized as global search heuristics. Genetic algorithms are a

part of evolutionary computing, which is a rapidly growing area of artificial intelligence.

It uses techniques inspired by evolutionary biology such as inheritance, mutation,

selection, and crossover. A small population of individual exemplars can effectively

search a large space because they contain schemata, useful substructures that can be

potentially combined to make fitter individuals. Formal studies of competing schemata

show that the best policy for replicating them is to increase them exponentially according

to their relative fitness. This turns out to be the policy used by genetic algorithms. Fitness

is determined by examining a large number of individual fitness cases. This process can

be very efficient if the fitness cases also evolve by their own GAs.

2

Page 3: Genetic Algorithm Report

1.Introduction

1.1 A Biology Lesson

 Every organism has a set of rules, a blueprint so to speak, describing how that organism

is built up from the tiny building blocks of life. These rules are encoded in the genes of

an organism, which in turn are connected together into long strings called chromosomes.

Each gene represents a specific trait of the organism, like eye colour or hair colour, and

has several different settings. For example, the settings for a hair colour gene may be

blonde, black or auburn. These genes and their settings are usually referred to as an

organism's genotype. The physical expression of the genotype - the organism itself - is

called the phenotype. When two organisms mate they share their genes. The resultant

offspring may end up having half the genes from one parent and half from the other. This

process is called recombination. Very occasionally a gene may be mutated. Normally this

mutated gene will not affect the development of the phenotype but very occasionally it

will be expressed in the organism as a completely new trait.

3

Page 4: Genetic Algorithm Report

1.2 About Genetic Algorithms

Genetic Algorithms are adaptive heuristic search algorithm premised on the

evolutionary ideas of natural selection and genetic. The basic concept of Genetic

Algorithms is designed to simulate processes in natural system necessary for

evolution, specifically those that follow the principles first laid down by Charles

Darwin of survival of the fittest. As such they represent an intelligent exploitation

of a random search within a defined search space to solve a problem. First

pioneered by John Holland in the 60s, Genetic Algorithms has been widely

studied, experimented and applied in many fields in engineering worlds. Not only

does Genetic Algorithms provide an alternative methods to solving problem, it

consistently outperforms other traditional methods in most of the problems link.

Many of the real world problems involved finding optimal parameters, which

might prove difficult for traditional methods but ideal for Genetic Algorithms .

However, because of its outstanding performance in optimisation, Genetic

Algorithms have been wrongly regarded as a function optimiser. In fact, there are

many ways to view genetic algorithms. Perhaps most users come to Genetic

Algorithms looking for a problem solver, but this is a restrictive view.

4

Page 5: Genetic Algorithm Report

1.3 Brief Overview

Genetic algorithms are inspired by Darwin's theory about evolution. Solution to a

problem solved by genetic algorithms is evolved.

Algorithm is started with a set of solutions (represented by chromosomes) called

population. Solutions from one population are taken and used to form a new

population. This is motivated by a hope, that the new population will be better

than the old one. Solutions which are selected to form new solutions (offspring)

are selected according to their fitness - the more suitable they are the more

chances they have to reproduce.

This is repeated until some condition (for example number of populations or

improvement of the best solution) is satisfied.

Example

Problem solving can be often expressed as looking for extreme of a function. This

is exactly what the problem shown here is. Some function is given and Genetic

Algorithms tries to find minimum of the function.

5

Page 6: Genetic Algorithm Report

Fig. 1.1

Fig 1.1 Graph represents some search space and goal is to travel from the gray cell to the

green cell in the shortest number of steps .

6

Page 7: Genetic Algorithm Report

2. Genetic Algorithm

1. [Start] Generate random population of n chromosomes (suitable solutions for the

problem)

2. [Fitness] Evaluate the fitness f(x) of each chromosome x in the population

3. [New population] Create a new population by repeating following steps until the

new population is complete .

[Selection] Select two parent chromosomes from a population according

to their fitness (the better fitness, the bigger chance to be selected) .

[Crossover ] In genetic algorithms, crossover is a genetic operator used to

vary the programming of a chromosome or chromosomes from one

generation to the next. It is analogous to reproduction and biological

crossover, upon which genetic algorithms are based. Cross over is a

process of taking more than one parent solutions and producing a child

solution from them. There are methods for selection of the chromosomes.

With a crossover probability cross over the parents to form a new

offspring (children). If no crossover was performed, offspring is an exact

copy of parents.

Fig 2.1

Figure 2.1 Shows the crossover between parent 1 and parent 2. As we can see, the

children take one section of the chromosome from each parent. The point at which the

7

Page 8: Genetic Algorithm Report

chromosome is broken depends on the randomly selected crossover point. This particular

method is called single point crossover because only one crossover point exists.

[Mutation] In genetic algorithms of computing, mutation is a genetic operator

used to maintain genetic diversity from one generation of a population of

algorithm chromosomes to the next. It is analogous to biological mutation.

Mutation alters one or more gene values in a chromosome from its initial state. In

mutation, the solution may change entirely from the previous solution. Hence

Genetic Algorithms can come to better solution by using mutation. With a

mutation probability mutate new offspring at each locus (position in chromosome.

Fig 2.2

After selection and crossover, you now have a new population full of individuals. Some

are directly copied, and others are produced by crossover. In order to ensure that the

individuals are not all exactly the same, you allow for a small chance of mutation.

[Accepting] Place new offspring in a new population

4. [Replace] Use new generated population for a further run of algorithm

5. [Test] If the end condition is satisfied, stop, and return the best solution in current

population

6. [Loop] Go to step 2

3. Who can benefit from Genetic Algorithms

8

Page 9: Genetic Algorithm Report

Nearly everyone can gain benefits from Genetic Algorithms, once he can encode

solutions of a given problem to chromosomes in Genetic Algorithms , and

compare the relative performance (fitness) of solutions.

An effective Genetic Algorithms representation and meaningful fitness

evaluation are the keys of the success in Genetic Algorithms applications.

The appeal of Genetic Algorithms comes from their simplicity and elegance as

robust search algorithms as well as from their power to discover good solutions

rapidly for difficult high-dimensional problems.

The search space is large, complex or poorly understood.

Domain knowledge is scarce or expert knowledge is difficult to encode to narrow

the search space.

No mathematical analysis is available.

Traditional search methods fail.

Genetic Algorithm have been used for problem-solving and for modeling .

Genetic Algorithms are applied to many scientific, engineering problems, in

business and entertainment, including traveling salesman problem.

9

Page 10: Genetic Algorithm Report

4. Applications

4.1. Automotive Design

Using Genetic Algorithms to both design composite materials and aerodynamic

shapes for race cars and regular means of transportation (including aviation) can

return combinations of best materials and best engineering to provide faster,

lighter, more fuel efficient and safer vehicles for all the things we use vehicles for.

Rather than spending years in laboratories working with polymers, wind tunnels

and balsa wood shapes, the processes can be done much quicker and more

efficiently by computer modeling using Genetic Algorithms searches to return a

range of options human designers can then put together however they please.

10

Page 11: Genetic Algorithm Report

4.2 Engineering Design

Getting the most out of a range of materials to optimize the structural and

operational design of buildings, factories, machines, etc. is a rapidly expanding

application of Genetic Algorithms. These are being created for such uses as

optimizing the design of heat exchangers, robot gripping arms, satellite booms,

building trusses, flywheels, turbines, and just about any other computer-assisted

engineering design application.

There is work to combine Genetic Algorithms optimizing particular aspects of

engineering problems to work together, and some of these can not only solve

design problems, but also project them forward to analyze weaknesses and

possible point failures in the future so these can be avoided.

11

Page 12: Genetic Algorithm Report

4.3 Robotics

Robotics involves human designers and engineers trying out all sorts of things in

order to create useful machines that can do work for humans. Each robot's design

is dependent on the job or jobs it is intended to do, so there are many different

designs out there.

Genetic Algorithms can be programmed to search for a range of optimal designs

and components for each specific use, or to return results for entirely new types of

robots that can perform multiple tasks and have more general application.

Genetic Algorithm designed robotics just might get us those nifty multi-purpose,

learning robots we've been expecting any year now since we watched the Jetsons

as kids, who will cook our meals, do our laundry and even clean the bathroom for

us !

12

Page 13: Genetic Algorithm Report

4.4 Optimized Telecommunications Routing

Do you find yourself frustrated by slow LAN performance, inconsistent internet

access, a FAX machine that only sends faxes sometimes, your land line's number

of 'ghost' phone calls every month? Well, Genetic Algorithms are being

developed that will allow for dynamic and anticipatory routing of circuits for

telecommunications networks.

These could take notice of your system's instability and anticipate your re-routing

needs. Using more than one Genetic Algorithms circuit-search at a time, soon

your interpersonal communications problems may really be all in your head rather

than in your telecommunications system.

Other Genetic Algorithms are being developed to optimize placement and

routing of cell towers for best coverage and ease of switching, so your cell phone

and blackberry will be thankful for Genetic Algorithms too.

13

Page 14: Genetic Algorithm Report

4.5 Trip, Traffic and Shipment Routing

New applications of a Genetic Algorithms known as the "Traveling Salesman

Problem" or TSP can be used to plan the most efficient routes and scheduling for

travel planners, traffic routers and even shipping companies. The shortest routes

for traveling. The timing to avoid traffic tie-ups and rush hours.

Most efficient use of transport for shipping, even to including pickup loads and

deliveries along the way. The program can be modeling all this in the background

while the human agents do other things, improving productivity as well! Chances

are increasing steadily that when you get that trip plan packet from the travel

agency, a Genetic Algorithms contributed more to it than the agent did.

6.Cancer gene search with data-mining and genetic Algorithms

6.1 Introduction

Cancer leads to approximately 25% of all mortalities, making it the second

leading cause of death in the United States. Early and accurate detection of cancer

14

Page 15: Genetic Algorithm Report

is critical to the well being of patients. Analysis of gene expression data leads to

cancer identification and classification which will facilitate proper treatment

selection and drug development. Gene expression data sets for ovarian, prostate,

and lung cancer were analyzed in this research. An integrated gene-search

algorithm for genetic expression data analysis was proposed. This integrated

algorithm involves a genetic algorithm and correlation-based heuristics for data

preprocessing (on partitioned data sets) and data mining (decision tree and

support vector machines algorithms) for making predictions. Knowledge derived

by the proposed algorithm has high classification accuracy with the ability to

identify the most significant genes.

Cancer develops mainly in epithelial cells, connecting/muscle tissue (sarcomas),

and white blood cells. A successive mutation in the normal cell that damages the

DNA and impairs the cell replication mechanism .There are number of

carcinogens such as tobacco smoke, radiation, certain microbes, synthetic

chemicals, polluted water, and air that may accelerate the mutations. Thus, there

is a need to identify the mutated genes that contribute to a cancerous state. One of

the methods for cancer identification is through the analysis of genetic data. The

human genome contains approximately10 million single nucleotide

polymorphisms. These Single nucleotide polymorphisms are responsible for the

variation that exists between human beings. Due to the high cost, genetic data

(containing as many as 15,000 genes per patient) is normally collected on a

limited number of patients (100–300 patients). There is a need to select the most

informative genes from such wide data sets . Removal of uninformative genes

decreases noise, confusion, and complexity, and increases the chances for

identification of the most important genes, classification of diseases, and

prediction of various outcomes, e.g., cancer type.

A genetic algorithm is a search algorithm based on the concept of natural

genetics. A genetic algorithm is initiated with a set of solutions (chromosomes)

called the population .Each solution in the population is evaluated based on its

fitness. Solutions chosen to form new chromosomes (offspring) are selected

according to the fitness, i.e., the more suitable the solution the higher the

15

Page 16: Genetic Algorithm Report

likelihood it will reproduce. This is repeated until some condition (for example,

the number of populations or quality of the best solution) is satisfied. Genetic

algorithm searches the solution space without following crisp constraints and

takes into account potentially all feasible solution regions. This provides a chance

of searching previously unexplored regions, and there is a high possibility of

achieving an overall optimal/near optimal solution, making the genetic algorithm

a global search algorithm

6.2 Integrated algorithm

The integrated gene-search algorithm consists of two phases. The iterative Phase I

includes data partitioning, execution of the Decision Tree algorithm (or other

data-mining algorithms) to the partitioned data set, the genetic algorithm, and the

correlation-based heuristics for gene reduction. The set of significant genes is

utilized in Phase II for validation of the quality of genes. A data-mining (i.e.,

classification) algorithm takes a training expression data set as input and predict if

the test sample is a normal or cancerous. Thus, data-mining algorithms are applied

to the training and testing data sets and their results are evaluated to determine the

most significant gene set.

In Phase I, the cancer training gene data set is initially partitioned into several

subsets with approximately 1000 genes in each subset (Fig. 6.1). The partitioning

of the data sets can be performed arbitrarily or randomly. The Decision Tree

algorithm is applied to each partitioned data set to determine the classification

accuracy. The total number of genes selected (most significant as well as medium

significant genes) from all the partitioned data sets is an overestimate of the actual

significant gene The total number of genes selected from all the partitioned data

sets are merged to formulate a single gene set (Fig. 6.1). If the current gene set is

more than the user-defined threshold (e.g.,1000 genes), then the gene set is re-

partitioned to form the next iteration of data-mining and GA–CFS(Genetic

16

Page 17: Genetic Algorithm Report

Algorithm-Correlation Based Feature Selection) algorithms. Phase I is repeated

until the number of significant genes is less than the threshold. To further reduce

the number of genes, the Genetic Algorithm-Correlation Based Feature

Selection)algorithm can be re-applied to the reduced gene data sets.

In Phase II, data-mining algorithms such as Decision Tree and Support Vector

Machine algorithms are then applied to the training dataset for only the significant

genes (Fig. 6.1). The classification accuracy obtained from this reduced gene data

set is not smaller than the maximum classification accuracy from the previous

partitioned data sets.This step validates the fact that the proposed gene selection

algorithm preserves the information/knowledge.

17

Page 18: Genetic Algorithm Report

Data set Data set Data set

Phase I

YES

NO

Phase II

s

Fig.6.1 Integrated gene-search algorithm

18

Complete data set for cancer

00001 to01000

01001 to02000

0i001 to0i+1000

1n001 to1n+1000

Datamining

Datamining

Datamining

Datamining

GA-CFS GA-CFS GA-CFS GA-CFS

Identify gene set

If >1000

Data mining

Testing results Training results

Most significant genes

Page 19: Genetic Algorithm Report

6.3 Conclusion

The integrated gene-search algorithm (Genetic Algorithm-Correlation Based Feature

Selection algorithm with data mining) was proposed and successfully applied to the

training and test genetic expression data sets of ovarian, prostate, and lung cancers. This

uniformly applicable algorithm not only provided high classification accuracy but also

determined a set of the most significant genes for each of the three cancers. These gene

sets require further investigation for their medical relevance, as the prediction power

attained from these gene sets is statistically equivalent to that reported in the literature.

The integrated gene-search algorithm is capable of identifying significant genes by

partitioning the data set with a correlation-based heuristic. The overestimate of the actual

significant gene set using this algorithm allows the investigation of potentially useful

genes or their combinations. This leads to multiple models and supports the underlying

hypothesis that genetic expression data sets can be used in diagnosis of various cancers.

19

Page 20: Genetic Algorithm Report

5.Genetic Algorithm Problems

5.1 The Algorithm

A genetic algorithm can be thought of as a search. Given some initial state, the algorithm

is searching for an optimal state. It does this in a way that mimics nature (hence the

name). Say you have a population of a certain species. The first generation of these

creatures may not be optimally suited for their environment. Over time the individuals

who are less suited die off while those that are well suited reproduce and dominate the

others. In addition to reproduction between well suited individuals (cross-over in the

context of a Genetic Algorithm) the offspring of those individuals experience mutation,

meaning that the child of individuals A and B is not purely a cross between the two, but

has its own unique traits. Generally mutation occurs at a low probability.

In the context of programming, a Genetic Algorithm can be expressed as a function that

takes as input a population and a fitness function. The population is a collection of

individuals and the fitness function is a means of determining how fit an individual is. At

generation zero the population is usually randomly generated. In order to get from

generation zero to generation 1, the algorithm uses the fitness function to determine

which individuals to include in the cross-over (reproduce), leaving out the rest. The

children of those individuals are then passed to a mutate function, that alters them in

some way, usually at a very low probability. Here’s some pseudo code that might help to

understand how this might be implemented:

20

Page 21: Genetic Algorithm Report

5.2 How They Are Used

There are a variety of problems that can be solved with genetic algorithms.

Genetic Algorithm are adept for optimization problems in particular. K-

SATISFIABILITY problems for example can be solved with a genetic algorithm

(though other means exist).

For anyone not familiar with K-SATISFIABILITY problems I’ll give a short

explanation. SATISFIABILITY (or satisfaction) problems attempt to assign values

to a boolean formula in such a way that it evaluates to true. So if my

SATISFIABILITY problem consists of two variables: A and B and one clause: A

OR B then one solution would be A = true, B = true.

Clauses are the components of the boolean formula, in the example I gave the

formula consists of only one clause. A larger SATISFIABILITY problem may

consist of hundreds of variables and thousands of clauses and cannot be solved on

paper in a reasonable amount of time. Here is an example of a larger sat problem:

(A OR B OR C) AND (A OR !B OR !C) AND (!A OR B OR !C)

This formula consists of three variables (A, B, C) and three clauses. A solution to

this problem would be A = true, B = true, C = false. Notice that there are many

different assignments of these variables that satisfy the formula. If there were

more clauses this might not be the case.

To solve a SATISFIABILITY problem with a genetic algorithm you start of with a

population of randomly generated “solutions”, each solution consisting of a

random assignment of true of false to each variable. This population is generation

zero. In this context the fitness function is defined as the number of satisfied (or

unsatisfied) clauses in the boolean formula.

21

Page 22: Genetic Algorithm Report

Using the fitness function, for each individual in generation zero, a fitness value is

determined. It might be the case that one of these individuals satisfies the formula,

in which case you’re done. Otherwise, in order to get from generation zero to

generation one, we must choose a portion of the population to “reproduce”, for

example, those having a fitness above the average.

Once we’ve made our selection we perform the cross over by producing a new

individual with a portion of its assignments coming from each parent (the size of

the portion may be determined randomly). For example, for individuals X and Y

and X(A,B,C) = {True , False, False} and Y(A,B,C) = {False, True, True} a

possible child would be Child(A,B,C) = {False, False, False}.

After we’ve generated a new population we then randomly mutate each individual

at a very low probability. At probabilities above 5% in many cases a solution will

not be found in a reasonable amount of time. A mutation takes an assignment and

flips it. So for the individual X(A,B,C) = {True, False, False} if a mutation event

occurs on the variable B, it will become X(A,B,C) {True, True, False}. Without

this mutation the algorithm does not approach a solution.

At Generation zero for a large problem, there is very little chance of a solution

existing. After each passing generation, however, the average fitness increases

and it becomes likely that an individual satisfies the formula.

5.3 Problems with Genetic Algorithms

After each generation the individuals of a population begin to approach the

solution. In the context of a SATISFIABILITY problem this means they satisfy

more and more clauses. There is, however, no guarantee that they will ever satisfy

all of them. This is because individuals that have a fitness near the maximum,

may actually be very different from the solution.

For example, say a SATISFIABILITY problem has the solution 000011000 where

each character in the bit string represents a variable and the 0s represent false, and

22

Page 23: Genetic Algorithm Report

the 1s represent true. The string 111100111 might satisfy 90% of the clauses. If

this is the case, the children produced by this individual will look similar to it and

the likelihood of it being mutated into the solution is essentially zero. The

following graph illustrates this problem:

Local Max Problem

From the graph you can see that there are two peaks, one reaching 100, the other

75. The higher one represents the solution to the problem, while the other is called

a local maximum. A genetic algorithm may reach the peak of a local maximum

and become stuck because all similar solutions have a lower fitness, while the

actual solution is un similar to the current state.

5.4 Possible Solutions

A possible way to fix this problem would be to reset the search. Generated a new

set of random solutions as the algorithm did at generation zero and proceed from

23

Page 24: Genetic Algorithm Report

there. This is called a random-reset. Hopefully after the reset the search will

approach the solution rather than a local max.

Another similar solution would be to mutate each individual in the current

population at a much higher rate, possibly 100%. This would produce a

population that very different from the one that existed at the local maximum.

These solutions would fix the problem in a case where there were only a few local

maximums, but for some problems it might be the case that there are numerous

local maximums. For these problems, genetic algorithms with random-reset might

find solutions that have very high fitness, but never the solution.

24

Page 25: Genetic Algorithm Report

6.CONCLUSION & FUTURE SCOPE

Genetic algorithm is a probabilistic solving optimization problem which is modeled on a

genetic evaluations process in biology and is focused as an effective algorithm to find a

global optimum solution for many types of problem. This algorithm is extremely

applicable in different artificial intelligence approaches as well as different basics

approaches like object oriented, robotics and other in future we shall concentrate on the

development of hybrid approaches using genetic algorithm an object oriented technology.

Genetic Algorithms are good at taking larger, potentially huge search spaces and

navigating them looking for optimal combinations of things and solutions which we

might never be able to find. The use of genetic algorithms to solve large and often

complex computational problems has given rise to many new applications in a variety of

disciplines. They have discovered powerful, high quality solutions to difficult practical

problems in a diverse variety of fields.

25

Page 26: Genetic Algorithm Report

7. References

[1] http://lancet.mit.edu/~mbwall/presentations/IntroToGAs

[2]http://www.ai-junkie.com/ga/intro/gat1.html

[3]http://en.wikipedia.org/wiki/Genetic_algorithm

[4]http://css.engineering.uiowa.edu/~ankusiak/Journal-papers/Gene_07.pdf

[5]http://brainz.org/15-real-world-applications-genetic-algorithms

26