genetic programming
DESCRIPTION
GENETIC PROGRAMMING. John R. Koza Consulting Professor (Biomedical Informatics) Department of Medicine School of Medicine Consulting Professor Department of Electrical Engineering School of Engineering Stanford University Stanford, California 94305 [email protected] - PowerPoint PPT PresentationTRANSCRIPT
GENETIC PROGRAMMING
John R. KozaConsulting Professor (Biomedical Informatics)
Department of Medicine
School of Medicine
Consulting Professor
Department of Electrical Engineering
School of Engineering
Stanford University
Stanford, California 94305
http://www.smi.stanford.edu/people/koza/
THE CHALLENGE OF ARTIFICIAL
INTELLIGENCE
“How can computers learn to solve problems without being explicitly programmed? In other words, how can computers be made to do what is needed to be done, without being told exactly how to do it?”
Attributed to Arthur Samuel (1959)
CRITERION FOR SUCCESS
"The aim [is] ... to get machines to exhibit behavior, which if done by humans, would be assumed to involve the use of intelligence.“
Arthur Samuel (1983)
MAIN POINT No. 1
• Genetic programming now routinely delivers high-return human-competitive machine intelligence
MAIN POINT No. 2
• Genetic programming is an automated invention machine
MAIN POINT No. 3
• Genetic programming has delivered a progression of qualitatively more substantial results in synchrony with five approximately order-of-magnitude increases in the expenditure of computer time
MAIN POINT No. 1
• Genetic programming now routinely delivers high-return human-competitive machine intelligence
“HUMAN-COMPETITIVE”
• The result is equal or better than human-designed solution to the same problem
NASA EVOLVED ANTENNA
X-Band Antenna for NASA's Space Technology 5 Mission in 2004
“HUMAN-COMPETITIVE”
• Previously patented, an improvement over a patented invention, or patentable today
DEFINITION OF “HIGH-RETURN”
The AI ratio (the “artificial-to-intelligence” ratio) of a problem-solving method as the ratio of that which is delivered by the automated operation of the artificial method to the amount of intelligence that is supplied by the human applying the method to a particular problem
DEFINITION OF “ROUTINE”
A problem solving method is routine if it is general and relatively little human effort is required to get the method to successfully handle new problems within a particular domain and to successfully handle new problems from a different domain.
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS
PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
REPRESENTATIONS
• Decision trees• If-then production
rules• Horn clauses• Neural nets• Bayesian networks• Frames• Propositional logic
• Binary decision diagrams
• Formal grammars • Coefficients for
polynomials• Reinforcement
learning tables• Conceptual clusters • Classifier systems
A COMPUTER PROGRAM
DESIRED OUTPUT OF PROGRAMTime Output
0 6
1 6
2 6
3 6
4 6
5 6
6 6
7 6
8 6
9 6
10 6
11 7
12 7
PROGRAM TREE
(+ 1 2 (IF (> TIME 10) 3 4))
GENETIC PROGRAMMING
• Create initial population (random)
• Main generational loop– Execute all programs– Evaluate fitness of all programs– Select single individuals or pairs of individuals
based on fitness to participate in the genetic operations (mutation, crossover, reproduction, architecture-altering operations)
• Termination Criterion
CREATING RANDOM PROGRAMS
CREATING RANDOM PROGRAMS
• Function Set F = {+, -, *, %, IFLTE}
• Terminal Set T = {X, Y, Random-Values}
CREATING RANDOM PROGRAMS
• The random programs are:–Of different sizes and shapes –Syntactically valid–Executable
DARWINIAN SELECTION
• Selection based on fitness
• Better individual more likely to be selected
• Probabilistic selection
- Best is not always picked
- Worst is not necessarily excluded
MUTATION OPERATION
MUTATION OPERATION
• Select 1 parent probabilistically based on fitness• Pick point from 1 to NUMBER-OF-POINTS• Delete subtree at the picked point• Grow new subtree at the mutation point in same
way as generated trees for initial random population (generation 0)
• The result is a syntactically valid executable program
• Put the offspring into the next generation of the population
CROSSOVER OPERATION
CROSSOVER OPERATION
• Select 2 parents probabilistically based on fitness
• Randomly pick a number from 1 to NUMBER-OF-POINTS for 1st parent
• Independently randomly pick a number for 2nd parent
• The result is a syntactically valid executable program
• Put the offspring into the next generation of the population
• Identify the subtrees rooted at the two picked points
REPRODUCTION OPERATION
• Select parent probabilistically based on fitness
• Copy it (unchanged) into the next generation of the population
PROBABILISTIC STEPS
• The initial population is typically random• Probabilistic selection based on fitness
- Best is not always picked
- Worst is not necessarily excluded• Random picking of mutation and crossover
points
ILLUSTRATIVE GP RUN
SYMBOLIC REGRESSION
Independent variable X
Dependent variable Y
-1.00 1.00
-0.80 0.84
-0.60 0.76
-0.40 0.76
-0.20 0.84
0.00 1.00
0.20 1.24
0.40 1.56
0.60 1.96
0.80 2.44
1.00 3.00
5 MAJOR PREPARATORY STEPS OF GP
• Determining the set of terminals• Determining the set of functions• Determining the fitness measure • Determining the parameters for the run• Determining the criterion for terminating a run
PREPARATORY STEPSObjective: Find a computer program with one input
(independent variable X) whose output equals the given data
1 Terminal set: T = {X, Random-Constants}
2 Function set: F = {+, -, *, %}
3 Fitness: The sum of the absolute value of the differences between the candidate program’s output and the given data (computed over numerous values of the independent variable x from –1.0 to +1.0)
4 Parameters: Population size M = 4
5 Termination: An individual emerges whose sum of absolute errors is less than 0.1
SYMBOLIC REGRESSION
POPULATION OF 4 RANDOMLY CREATED INDIVIDUALS FOR GENERATION 0
SYMBOLIC REGRESSION x2 + x + 1
GENERATION 1
Copy of (a)
Mutant of (c) picking “2” as mutation point
First offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points
Second offspring of crossover of (a) and (b) picking “+” of parent (a) and left-most “x” of parent (b) as crossover points
CLASSIFICATION
GP TABLEAU – INTERTWINED SPIRALS
Objective: Create a program to classify a given point in the x-y plane to the red or blue spiral
1 Terminal set: T = {X,Y,Random-Constants}
2 Function set: F = {+,-,*,%,IFLTE}
3 Fitness: Accuracy of classification (0 – 194)
4 Parameters: M = 10,000. G = 51
5 Termination: A program is 100% accurate
BOX MOVER – BEST OF GEN 0
BOX MOVERGEN 45 – FITNESS CASE 1
GENETIC PROGRAMMING: ON THE PROGRAMMING OF COMPUTERS BY
MEANS OF NATURAL SELECTION
(Koza 1992)
2 MAIN POINTS FROM 1992 BOOK
• Virtually all problems in artificial intelligence, machine learning, adaptive systems, and automated learning can be recast as a search for a computer program.
• Genetic programming provides a way to successfully conduct the search for a computer program in the space of computer programs.
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
COMPUTER PROGRAMS
• Subroutines provide one way to REUSE code possibly with different instantiations of the dummy variables (formal parameters)
• Loops (and iterations) provide a 2nd way to REUSE code• Recursion provide a 3rd way to REUSE code• Memory provides a 4th way to REUSE the results of
executing code
DIFFERENCE IN VOLUMES
D = L0W0H0 – L1W1H1
EVOLVED SOLUTION
(- (* (* W0 L0) H0) (* (* W1 L1) H1))
AUTOMATICALLY DEFINED FUNCTION volume
AUTOMATICALLY DEFINED FUNCTION volume
(progn
(defun volume (arg0 arg1 arg2)
(values
(* arg0 (* arg1 arg2))))
(values
(- (volume L0 W0 H0)
(volume L1 W1 H1))))
AUTOMATICALLY DEFINED FUNCTIONS (SUBROUTINES)
• ADFs provide a way to REUSE code• Code is typically reused with different
instantiations of the dummy variables (formal parameters)
DIVIDE AND CONQUER
DIVIDE AND CONQUER
• Decompose a problem into sub-problems
• Solve the sub-problems
• Assemble the solutions of the sub-problems into a solution for the overall problem
CHANGE OF REPRESENTATION
CHANGE OF REPRESENTATION
• Identify regularities
• Change the representation
• Solve the overall problem
GENETIC PROGRAMMING II: AUTOMATIC DISCOVERY OF REUSABLE PROGRAMS
(Koza 1994)
MAIN POINTS OF 1994 BOOK
• Scalability is essential for solving non-trivial problems in artificial intelligence, machine learning, adaptive systems, and automated learning
• Scalability can be achieved by reuse• Genetic programming provides a way to
automatically discover and reuse subprograms in the course of automatically creating computer programs to solve problems
MEMORY
Settable (named)
variables
Indexed vector
memory
Matrix
memory Relational memory
TRANSMEMBRANE SEGMENT IDENTIFICATION PROBLEM
(progn (defun ADF0 ()(ORN (ORN (ORN (I?) (H?)) (ORN (P?) (G?))) (ORN (ORN (ORN (Y?)
(N?)) (ORN (T?) (Q?))) (ORN (A?) (H?)))))) (defun ADF1 ()(values (ORN (ORN (ORN (A?) (I?)) (ORN (L?) (W?))) (ORN (ORN
(T?) (L?)) (ORN (T?) (W?)))))) (defun ADF2 ()(values (ORN (ORN (ORN (ORN (ORN (D?) (E?)) (ORN (ORN (ORN
(D?) (E?)) (ORN (ORN (T?) (W?)) (ORN (Q?) (D?)))) (ORN (K?) (P?)))) (ORN (K?) (P?))) (ORN (T?) (W?))) (ORN (ORN (E?) (A?)) (ORN (N?) (R?))))))
(progn (loop-over-residues (SETM0 (+ (- (ADF1) (ADF2)) (SETM3 M0))))
(values (% (% M3 M0) (% (% (% (- L -0.53) (* M0 M0)) (+ (% (% M3 M0) (% (+ M0 M3) (% M1 M2))) M2)) (% M3 M0))))))
ADL
ADR
HUMAN-COMPETITIVE RESULTS(NOT RELATED TO PATENTS)
Transmembrane segment identification problem for proteins
Motifs for D–E–A–D box family and manganese superoxide dismutase family of proteins
Cellular automata rule for Gacs-Kurdyumov-Levin (GKL) problem
Quantum algorithm for the Deutsch-Jozsa “early promise” problem
Quantum algorithm for Grover’s database search problem
Quantum algorithm for the depth-two AND/OR query problem
Quantum algorithm for the depth-one OR query problem
Protocol for communicating information through a quantum gate
Quantum dense coding
Soccer-playing program that won its first two games in the 1997 Robo Cup competition
Soccer-playing program that ranked in the middle of field in 1998 Robo Cup competition
Antenna designed by NASA for use on spacecraft
Sallen-Key filter
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
AUTOMATIC SYNTHESIS OF BOTH THE TOPOLOGY AND SIZING OF ANALOG ELECTRICAL CIRCUITS
COMPONENT-CREATING FUNCTIONS
• Resistor R function
• Capacitor C function
• Inductor L function
• Diode D function
• Transistor Q function (3-leaded)
COMPONENT-CREATING FUNCTIONS
TOPOLOGY-MODIFYING FUNCTIONS
• SERIES division• PARALLEL division• VIA• FLIP
TOPOLOGY-MODIFYING FUNCTIONS
DEVELOPMENT-CONTROLLING FUNCTIONS
• END function• NOP (No Operation) function• SAFE_CUT function
DEVELOPMENTAL GP
DEVELOPMENTAL GP
(LIST (C (– 0.963 (– (– -0.875 -0.113) 0.880)) (series (flip end) (series (flip end) (L -0.277 end) end) (L (– -0.640 0.749) (L -0.123 end)))) (flip (nop (L -0.657 end)))))
CAPACITOR-CREATING FUNCTION
(LIST (C (– 0.963 (– (– -0.875
-0.113) 0.880)) (series (flip
end) (series (flip end) (L –
0.277 end) end) (L (– -0.640
0.749) (L -0.123 end)))) (flip
(nop (L -0.657 end)))))
CAPACITOR-CREATING FUNCTION
SERIES DIVISION FUNCTION
(LIST (C (– 0.963 (– (– -0.875
-0.113) 0.880)) (series (flip
end) (series (flip end) (L –
0.277 end) end) (L (– -0.640
0.749) (L -0.123 end)))) (flip
(nop (L -0.657 end)))))
SERIES DIVISION
EVALUATION OF FITNESS
Program Tree
+ IN OUT z0
Embryonic Circuit
Fully Designed Circuit (NetGraph)
Circuit Netlist (ascii)
Circuit Simulator (SPICE)
Circuit Behavior (Output)
Fitness
DESIRED BEHAVIOR OF A LOWPASS FILTER
EVOLVED CAMPBELL FILTER
U. S. patent 1,227,113George Campbell
American Telephone and Telegraph1917
EVOLVED ZOBEL FILTER
U. S. patent 1,538,964Otto Zobel
American Telephone and Telegraph Company1925
EVOLVED SALLEN-KEY FILTER
EVOLVED DARLINGTON EMITTER-FOLLOWER SECTION
U. S. patent 2,663,806
Sidney Darlington
Bell Telephone Laboratories
1953
NEGATIVE FEEDBACK
HAROLD BLACK’S RIDE ON THE LACKAWANNA FERRY
Courtesy of Lucent Technologies
20th-CENTURY PATENTS
Campbell ladder topology for filters
Zobel “M-derived half section” and “constant K” filter sections
Crossover filter
Negative feedback
Cauer (elliptic) topology for filters
PID and PID-D2 controllers
Darlington emitter-follower section and voltage gain stage
Sorting network for seven items using only 16 steps
60 and 96 decibel amplifiers
Analog computational circuits
Real-time analog circuit for time-optimal robot control
Electronic thermometer
Voltage reference circuit
Philbrick circuit
NAND circuit
Simultaneous synthesis of topology, sizing, placement, and routing
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
SIX POST-2000 PATENTED INVENTIONS
EVOLVED HIGH CURRENT LOAD CIRCUIT
REGISTER-CONTROLLED CAPACITOR CIRCUIT
LOW-VOLTAGE CUBIC CIRCUIT
VOLTAGE-CURRENT-CONVERSION CIRCUIT
LOW-VOLTAGE BALUN CIRCUIT
TUNABLE INTEGRATED ACTIVE FILTER
21st-CENTURY PATENTED INVENTIONS
Low-voltage balun circuit
Mixed analog-digital variable capacitor circuit
High-current load circuit
Voltage-current conversion circuit
Cubic function generator
Tunable integrated active filter
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
NOVELTY-DRIVEN EVOLUTION
• Two factors in fitness measure– Circuit’s behavior– Largest number of nodes and edges (circuit
components) of a subgraph of the given circuit that is isomorphic to a subgraph of a template representing the prior art.
PRIOR ART TEMPLATE
NON-INFRINGING SOLUTION NO. 1
NON-INFRINGING SOLUTION NO. 5
GP AS AN INVENTION MACHINE
AUTOMATIC SYNTHESIS OF CIRCUIT LAYOUT
INCLUDING THE PLACEMENT OF COMPONENTS AND ROUTING OF
WIRES
ALONG WITH THE TOPOLOGY AND SIZING
CIRCUIT LAYOUT
• Circuit placement involves the assignment of each of the circuit's components to a particular physical location on a printed circuit board or silicon wafer.
• Routing involves the assignment of a particular physical location to the wires between the leads of the circuit's components.
LAYOUT
LAYOUT — GENERATION 0
100%-COMPLIANT LOWPASS FILTER
GENERATION 25 WITH 5 CAPACITORS AND 11 INDUCTORS AREA OF 1775.2
100%-COMPLIANT LOWPASS FILTER
BEST-OF-RUN CIRCUIT OF GENERATION 138 WITH 4 INDUCTORS AND 4 CAPACITORS AREA OF 359.4
REVERSE ENGINEERING OF METABOLIC PATHWAYS
USING A TURTLE TO DRAW TWO-DIMENSIONAL ANTENNA
BEST-OF-RUN ANTENNA FROM GENERATION 90
3-DIMENSIONAL ANTENNA
EVOLVED SORTING NETWORK
GENETIC NETWORK FOR lac operon
EVOLVED NETWORK
(IF (< LACTOSE_LEVEL 9.139 ) (IF (<REPRESSOR_LEVEL 6.270 ) (IF (> GLUCOSE_LEVEL5.491 ) 2.02 (IF (< CAP_LEVEL 0.639 ) 2.033 (IF(< CAP_LEVEL 4.858 ) (IF (> LACTOSE_LEVEL 2.511 )(IF (> CAP_LEVEL 7.807 ) 5.586 (IF (>LACTOSE_LEVEL 2.114 ) 1.978 2.137 ) ) 0.0 ) (IF(> REPRESSOR_LEVEL 4.015 ) 0.036 (IF (<GLUCOSE_LEVEL 5.128 ) 10.0 (IF (< REPRESSOR_LEVEL4.268 ) 2.022 9.122 ) ) ) ) ) ) (IF (> CAP_LEVEL0.842 ) 0.0 5.97 ) ) (IF (< CAP_LEVEL 1.769 )2.022 (IF (< GLUCOSE_LEVEL 2.382 ) (IF (>LACTOSE_LEVEL 1.256 ) (IF (> LACTOSE_LEVEL 1.933) (IF (> GLUCOSE_LEVEL 2.022 ) (IF (<GLUCOSE_LEVEL 5.183 ) 6.323 (IF (> CAP_LEVEL1.208 ) 9.713 0.842 ) ) 10.0 ) (IF (>GLUCOSE_LEVEL 6.270 ) 2.109 ) 1.965 ) ) 0.665 )
1.982 ) ) )
OTHER STRUCTURES
GENETIC PROGRAMMING III: DARWINIAN INVENTION AND PROBLEM SOLVING
(Koza, Bennett, Andre, Keane 1999)
SUBROUTINE DUPLICATION
SUBROUTINE CREATION
SUBROUTINE DELETION
ARGUMENT DUPLICATION
ARGUMENT DELETION
16 ATTRIBUTES OF A SYSTEM FOR AUTOMATICALLY CREATING COMPUTER
PROGRAMS
• Starts with "What needs to be done"
• Tells us "How to do it"• Produces a computer
program• Automatic determination of
program size• Code reuse• Parameterized reuse• Internal storage• Iterations, loops, and
recursions
• Self-organization of hierarchies
• Automatic determination of program architecture
• Wide range of programming constructs
• Well-defined• Problem-independent• Wide applicability• Scalable• Competitive with human-
produced results
GENETIC PROGRAMMING PROBLEM SOLVER (GPPS)
PROGRESSION OF QUALITATIVELY MORE SUBSTANTIAL RESULTS PRODUCED BY GP
• Toy problems
• Human-competitive non-patent results
• 20th-century patented inventions
• 21st-century patented inventions
• Patentable new inventions
AUTOMATIC SYNTHESIS OF BOTH THE TOPOLOGY
AND TUNING OF CONTROLLERS
PARAMETERIZED TOPOLOGY FOR GENERAL-PURPOSE CONTROLLER
EVOLVED EQUATIONS FOR GENERAL-PURPOSE CONTROLLER
EVOLVED EQUATIONS FOR GENERAL-PURPOSE CONTROLLER
PATENTABLE NEW INVENTIONS
PID tuning rules that outperform the Ziegler-Nichols and Åström-Hägglund tuning rules
General-purpose controllers outperforming Ziegler-Nichols and Åström-Hägglund rules
PARALLELIZATION WITH SEMI-ISOLATED SUBPOPULATIONS
GP PARALLELIZATION
• Like Hormel, Get Everything Out of the Pig, Including the Oink
• Keep on Trucking
• It Takes a Licking and Keeps on Ticking
• The Whole is Greater than the Sum of the Parts
PETA-OPS
• Human brain operates at 1012 neurons operating at 103 per second = 1015 ops per second
• 1015 ops = 1 peta-op = 1 bs (brain second)
GP 1987–2002
System Dates Speed-up over first system
Human-competitive results
Problem Category
Serial LISP 1987–1994 1 (base) 0 toy problems 64 transputers 1994–1997 9 2 human-competitive
results not related to patented inventions
64 PowerPC’s 1995–2000 204 12 20th-century patented inventions
70 Alpha’s 1999–2001 1,481 2 20th-century patented inventions
1,000 Pentium II’s 2000–2002 13,900 12 21st-century patented inventions
4-week runs on 1,000 Pentium II’s
2002-2003 130,000 2 patentable new inventions
PROMISING GP APPLICATION AREAS
• Problem areas involving many variables that are interrelated in highly non-linear ways
• Inter-relationship of variables is not well understood
• Discovery of the size and shape of the solution is a major part of the problem
PROMISING GP APPLICATION AREAS (CONTINUED)
• "Black art" problems
• Areas where you simply have no idea how to program a solution, but where you know what you want
PROMISING GP APPLICATION AREAS (CONTINUED)
• Problem areas where a good approximate solution is satisfactory
design
control
bioinformatics
classification
data mining
system identification
forecasting
PROMISING GP APPLICATION AREAS (CONTINUED)
Areas where large computerized databases are accumulating and computerized techniques are needed to analyze the data
genome, protein, microarray data
satellite image data
astronomical data
petroleum databases
financial databases
medical records
marketing databases
PROMISING GP APPLICATION AREAS (CONTINUED)
Areas for which humans find it very difficult to write good programs
parallel computers cellular automata multi-agent strategies field-programmable game arrays digital signal processors swarm intelligence
DIFFERENCES BETWEEN GP AND ARTIFICIAL INTELLIGENCE (AI) AND MACHINE LEARNING (ML)
APPROACHES
REPRESENTATION
Genetic programming overtly conducts it
search for a solution to the given problem
in program space
POINT-TO-POINT TRANSFORMATIONS IN THE
SEARCH
Genetic programming does not conduct its
search by transforming a single point in the
search space into another single point, but
instead transforms a set of points into
another set of points
HILL CLIMBING IN THE SEARCH
Genetic programming does not rely
exclusively on greedy hill climbing to
conduct its search, but instead allocates a
certain number of trials, in a principled
way, to choices that are known to be
inferior
DETERMINISM IN THE SEARCH
Genetic programming conducts its search
probabilistically
EXPLICIT KNOWLEDGE BASE
Genetic programming does NOT make use
of a knowledge base
ROLE OF FORMAL LOGIC IN THE SEARCH
Genetic programming does not utilize
formal logic in it’s search strategy. Contradictory alternatives are created and actively maintained.
SOURCE
Genetic programming is biologically inspired.
RESULTS
• Genetic programming now routinely delivers high-return human-competitive machine intelligence