project 2: classification using genetic programming 2008. 10. 27 kim, minhyeok [email protected]...
TRANSCRIPT
Project 2: Project 2: Classification Using Genetic Programming Classification Using Genetic Programming
2008. 10. 27
Kim, MinHyeok
[email protected] Biointelligence laboratory
Artificial Intelligence
ContentsContents
Project outline Description on the data set Genetic Programming
Brief overview Fitness function & Selection methods Classification with GP (in this project)
Guide to writing reports Style & contents
Submission guide / Marking scheme
2(C) 2008, SNU Biointelligence Laboratory
3(C) 2008, SNU Biointelligence Laboratory
OutlineOutline
Goal Understand the Genetic Programming (GP) deeper Practice researching and writing a paper
Forest Fires problem (classification) To predict whether a fire occurs or not Using Genetic Programming Estimating several statistics on the dataset
Data set Variation of the ‘Forest Fires data set’
http://archive.ics.uci.edu/ml/datasets/Forest+Fires
Forest Fires Forest Fires Data SetData Set
Description Database of 517 samples
You can use at most 500 samples for training 17 samples for prediction
12 attributes X,Y,month,day,FFMC,DMC,DC,ISI,temp,RH,wind,rain,label Integer or real value
Label (Class) Two classes
– 0 : a fire does not occur
– 1 : a fire occurs
4(C) 2008, SNU Biointelligence Laboratory
Brief Summary of GPBrief Summary of GP
A kind of evolutionary algorithms It is represented with a tree structure You need to set up following elements for GP run
The set of terminals (input attributes, the class variable, constants)
The set of functions (numerical / condition operators)
The fitness measure The algorithm parameters
population size, maximum number of generations crossover rate and mutation rate maximum depth of GP trees etc.
The method for designating a result and the criterion for terminating a run.
5(C) 2008, SNU Biointelligence Laboratory
6
GP FlowchartGP Flowchart
GA loop GP loop
InitializationInitialization
Maximum initial depth of trees Dmax is set.
Full method (each branch has depth = Dmax): nodes at depth d < Dmax randomly chosen from function set F
nodes at depth d = Dmax randomly chosen from terminal set T
Grow method (each branch has depth Dmax): nodes at depth d < Dmax randomly chosen from F T
nodes at depth d = Dmax randomly chosen from T
Common GP initialisation: ramped half-and-half, where grow and full method each deliver half of initial population
7(C) 2008, SNU Biointelligence Laboratory
Fitness FunctionsFitness Functions
Relative squared error
The number of outputs that are within % of the correct value
And you can try other fitness functions which are well-defined to solve problems
n
i i
ii
y
xfyFitness
1
2
)(ˆ
Selection methods (1/2)Selection methods (1/2)
Fitness proportional (roulette wheel) selection The roulette wheel can be constructed as follows.
Calculate the total fitness for the population.
Calculate selection probability pk for each chromosome vk.
Calculate cumulative probability qk for each chromosome vk.
SIZEPOP
kkifF
_
1
)(
SIZEPOPkF
ifp kk _,...,2,1 ,
)(
SIZEPOPkpqk
jjk _,...,2,1 ,
1
Procedure: Proportional_Selection Generate a random number r from the range [0,1]. If r q1, then select the first chromosome v1; else, select the kth
chromosome vk (2 k pop_size) such that qk-1 < r qk.
pk qk
1 0.082407 0.082407
2 0.110652 0.193059
3 0.131931 0.324989
4 0.121423 0.446412
5 0.072597 0.519009
6 0.128834 0.647843
7 0.077959 0.725802
8 0.102013 0.827802
9 0.083663 0.911479
10 0.088521 1.000000
0.036441)(_
1
sizepop
kkifF
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1
Selection methods (2/2)Selection methods (2/2)
Tournament selection Tournament size q
Ranking-based selection
2 POP_SIZE 1 + 2 and - = 2 - +
Elitism To preserve n good solutions until the next generation
1
1)(
1
i
pi
Classification with GP (in this project)Classification with GP (in this project)
Function Regression Search a function f(x) s.t.
f(x) ≥ threshold t when y=1 f(x) < threshold t when y=0
Converting to Boolean value∧
¬ ∨
= > <
0rain RH 50 wind +
FFMC ISI
IF
> 1 0
f(x) t
What to do for the experiment?What to do for the experiment?
Select a library that implements GP You can find various libraries written in C++/Java/Matlab See the list of recommended libraries on the next page
Build up your own code for the experiment Check sample codes and tutorials of libraries for quick start Add comments to explain the flow of your program
Caution Running GP may take much time
13(C) 2008, SNU Biointelligence Laboratory
Recommended Libraries for GPRecommended Libraries for GP
C++ GPLib: http://www.cs.bham.ac.uk/~cmf/GPLib/index.html
Java JGAP: http://jgap.sourceforge.net/ ECJ: http://cs.gmu.edu/~eclab/projects/ecj/
Matlab toolbox GPLAB: http://gplab.sourceforge.net/
More References Implementations section in Wiki – Genetic Programming:
http://en.wikipedia.org/wiki/Genetic_programming
14(C) 2008, SNU Biointelligence Laboratory
Reports StyleReports Style
English only!! Scientific journal-style
How to Write A Paper in Scientific Journal Style and Format http://abacus.bates.edu/~ganderso/biology/resources/writing/HTWsections.html
15(C) 2008, SNU Biointelligence Laboratory
Experimental process Section of Paper
What did I do in a nutshell? Abstract
What is the problem? Introduction
How did I solve the problem? Materials and Methods
What did I find out? Results
What does it mean? Discussion
Who helped me out? Acknowledgments (optional)
Whose work did I refer to? Literature Cited
Extra Information Appendices (optional)
Report Contents (1/3)Report Contents (1/3)
System description Used programming language and running environments
Result tables
Analysis & discussion (Very Important!!)
16(C) 2008, SNU Biointelligence Laboratory
Training
Average SD
Best Worst
Setting 1
% % % %
Setting 2
% % % %
Setting 3
% % % %
Your prediction
1 2 … 16 17 Equation
Report Contents (2/3)Report Contents (2/3)
Graph Avg., Max. Fitness versus Generation
Tree size versus Generation
17(C) 2008, SNU Biointelligence Laboratory
Report Contents (3/3)Report Contents (3/3)
Basic experiments Changing parameters for the crossover and mutation Various function sets: arithmetic, numerical
Optional experiments Various selection methods Depth limitation Population size, generation numbers Comparison to Neural Network …
References
18(C) 2008, SNU Biointelligence Laboratory
19(C) 2008, SNU Biointelligence Laboratory
Submission GuideSubmission Guide
Due date: Nov. 19 (Wed) 18:00 Submit both ‘hardcopy’ and ‘email’
Hardcopy submission to the office (301-417 ) E-mail submission to [email protected]
Subject : [AI Project1 Report] Student number, Name Report + your source code with comments + executable file(s)
Length: report should be summarized within 12 pages. We are NOT interested in the accuracy and your programming
skill, but your creativity and research ability.
If your major is not a C.S, team project with a C.S major student is possible (Use the class board to find your partner and notice the information of your team to TA ([email protected]) by Nov. 5)
Marking SchemeMarking Scheme
5 points for programming 5 points for result prediction 30 points for experiment & analysis
15 pts for experiments, 15pts for analysis
10 points for report Late work
- 10% per one day Maximum 7 days
20(C) 2008, SNU Biointelligence Laboratory
QnAQnA
21(C) 2008, SNU Biointelligence Laboratory
Test DataTest Data
X Y monthda
yFFMC DMC DC ISI temp RH wind rain
Data01 6 5 9 3 92.9 133.3 699.6 9.2 26.4 21 4.5 0
Data02 6 3 11 2 79.5 3 106.7 1.1 11.8 31 4.5 0
Data03 4 3 7 4 93.2 114.4 560 9.5 30.2 22 4.9 0
Data04 6 5 6 1 90.4 93.3 298.1 7.5 19.1 39 5.4 0
Data05 6 3 4 7 91 14.6 25.6 12.3 13.7 33 9.4 0
Data06 5 4 4 7 91 14.6 25.6 12.3 17.6 27 5.8 0
Data07 4 3 5 5 89.6 25.4 73.7 5.7 18 40 4 0
Data08 7 5 10 1 91.7 48.5 696.1 11.1 16.1 44 4 0
Data09 8 6 3 5 91.7 33.3 77.5 9 8.3 97 4 0.2
Data10 7 5 8 2 96.1 181.1 671.2 14.3 27.3 63 4.9 6.4
Data11 6 5 9 6 91.2 94.3 744.4 8.4 15.4 57 4.9 0
Data12 8 6 8 1 92.1 207 672.6 8.2 21.1 54 2.2 0
Data13 7 4 9 5 88.2 55.2 732.3 11.6 15.2 64 3.1 0
Data14 4 3 9 2 91.9 111.7 770.3 6.5 15.9 53 2.2 0
Data15 3 6 9 7 92.4 124.1 680.7 8.5 17.2 58 1.3 0
Data16 3 6 9 1 90.9 126.5 686.5 7 15.6 66 3.1 0
Data17 9 9 7 2 85.8 48.3 313.4 3.9 18 42 2.7 0