evolving more random number generators using genetic...

15
Evolving More Random Number Generators Using Genetic Programming Joe Barker CS447 - Dr. D. Tauritz - UM-Rolla November 25, 2002

Upload: nguyenque

Post on 28-Mar-2018

216 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

Evolving More Random Number Generators Using Genetic ProgrammingJoe Barker

CS447 - Dr. D. Tauritz - UM-RollaNovember 25, 2002

Page 2: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

AbstractThis paper is based largely on, and is intended to be an improvement of, John Koza's paper "Evolving a Computer Program to Generate Random Numbers Using the Genetic Programming Paradigm" (1991). It describes an application of the Genetic Programming concept to the creation of Pseudo-Random Number generators(PRNGs). Evolutionary operators are applied to each expression tree, which represents a possible PRNG. Appropriate re-application of these operators should theoretically give progressively better PRNGs.The intention of this project is to verify and improve upon Koza's work.

KeywordsEA, Evolutionary Algorithm, Expression Trees, Genetic Programming, PRNG, Pseudo-Random Number Generator

1

Page 3: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

I. Introduction

There are many places in computer science that require a source of random numbers. In fact, there are more applications for random numbers appearing every day.

In the fields of engineering and science, Monte-Carlo methods owe their very existence to random number. And their effectiveness is directly related to how good the random number generator is.

The single largest use of random numbers is probably in the field of Software Assurance/Quality Testing. This is probably the first place that computer science students come into contact with random number generation. When they are asked to prove whether their code is correct, the most common thing to try is random inputs. The generation and testing of random test cases, while not proving whether the code is correct, does allow the tester to gain some confidence that the code is probably correct. Without this procedure, software testing would probably be significantly slower as programmers would have to develop many more test cases by hand.

A final use for random numbers should be mentioned, as it deals directly with this project. All Evolutionary Algorithms(EAs) require a source of random numbers. From creating the initial population to performing mutation and crossover, there is hardly any part of an EA that doesn't require random numbers.

Unfortunately, the generation of these numbers by a computer is not an easy task. A large body of theory lies behind the simple rand() call in the C library. In fact, Knuth takes 16 pages in his book, The Art of Computer Programming Vol. 2 [2], to decide what the appropriate constants are for a simple Pseudo-Random Number Generator(PRNG). He takes another 76 pages to describe various methods to try and decide if a sequence of numbers even is random!

The goal of this project is to use the Evolutionary Algorithm(EA) concept, or more specifically Genetic Programming(GP), to create "good" Pseudo-Random Number Generators. The advantage of taking this route is that very little theory is necessary and hardly any forethought on the structure of the PRNG is required. In fact, the process performs better when fewer preconceptions are placed on it.

This project is intended to follow and improve upon the work by John Koza [1].

2

Page 4: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

II. Background

A. What are Random Numbers?

Unfortunately, mathematicians find it very hard to agree on a definition for random numbers. Therefore, we must develop and approximate definition instead.

The first thing to consider is what kind of random number we're talking about. There are are many different possible distributions like Gaussian or Chi but the most common target is the Uniform distribution, and that is what we will concentrate on from now on.

Unfortunately, even though this is decided, we're not any better off than we were. We still need some method of determining whether a sequence of numbers is uniformly distributed or not. Luckily Knuth [2] has provided us several tests for uniformness/randomness. A few of these have been selected by the author as appropriate(i.e. simple) for use in checking our trial PRNGs.

1) A theoretical test that can be useful is the Chi-square test. It is used to compare a theoretical distribution with a measured one. It is calculated thus:

Given:

i 1= n..

Yi The occurrences for class i

npi The theoretical occurrences for class i

V

i

Yi npi−( )2

npi∑=

We then try to achieve:

χ2

0.5 n 1−,( ) ε+ V= ε small=

2) The first practical test is the frequency or equidistribution test. This test is almost a direct application of the Chi-square test. Each possible value is considered a class and it is checked against the uniform distribution using the Chi-square test. Unfortunately, this is impractical for our purposes. However, if we instead divide the random number space into equal pieces and use these as the classes instead, this test becomes a possibility.

3

Page 5: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

4

Two special cases of this test are (α,β)=(0,0.5) or (0.5,1) are called "runs above the mean" and "runs below the mean," respectively.

Better descriptions of these and other tests can be found in Knuth's book [2].

pn p 1 p−( )n

=p β α−=

The occurrences of (j-i-1) are counted and a Chi-square test is performed with the following distribution:

k i 1+= j 1−..Sk β N⋅≥orSk α N⋅<

α N⋅ Sj≤ β N⋅<

α N⋅ Si≤ β N⋅<

Then we wish to find all possible i,j such that:Maximum value possibleN

The classical use of entropy involves checking an N-bit binary sequence where each value is equally likely. When a base-2 log is used, the maximal entropy is N.

4) The last test that we will consider is the gap test. The sequences are examined for gaps between values that lay within a certain range. Thus:

E

i

pi log1pi

⋅∑=

The percent occurrence of class i in the whole sequencepi

i 1= n..

3) Another test similar to the frequency test is entropy. This is a measure of the disorder of a sequence and is calculated thus:

Page 6: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

5

A common choice of constants is SR[3,28,31].

3) The last common class is the shuffling randomizer. This is actually a kind of meta-randomizer since it requires two other randomizers to operate. One of them is used to populate and refill an array and the other randomizer is used to select members of this array.

xi 1+ ShiftLeft t b,( ) t⊕( )= 2c

1−( )∧

x0 seed=t ShiftRight xi a,( ) xi⊕( )= 2c

1−( )∧

For a discussion on what are appropriate choices for a,b and c, see Knuth's book [2].

2) Shift register randomizers are another class of popular usually specified as SR[a,b,c]. They use the following relation:

c 231

=b 0=a 65539=

c 231

1−=b 0=a 75

=Park-Miller:

URN08/RANDU:

With careful choices of a,b and c and sequence of numbers with very long period can be generated. Some popular choices are:

Where c is near the machine's word size.0 x0≤ c<0 b≤ c<0 a< c<

x0 seed=xi 1+ a xi⋅ b+( )mod c⋅=

As Von Neumann implies, trying to generate random numbers on a deterministic machine is an exercise in futility. However, because of the need for atleast random-seeming numbers, we must proceed.

There are several commonly used classes of PRNGs, most of which are composed of a recurrence relation and require a seed value. These follow:

1) Linear congruential randomizers are some of the most popular PRNGs. Additionally, they are some of the earliest known randomizers. They have the following format:

"Anyone who considers arithmetical methods of producing random digits is, of course, in state of sin." - John Von Neumann, 1951

B. Putting the Pseudo- in PRNG

Page 7: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

C. Genetic Programming?"Shut me down! Machines building machines. How perverse."-C3PO (Anthony Daniels), Star Wars: Episode II

Expression trees are advantageous because the mutation and crossover operators are extremely straightforward. With careful definition of the non-terminals, these trees also avoid stability problem mentioned above.

6

Computer scientists have long sought for methods to make computer programs write other computer programs. From compilers to the latest in AI, great strides have been made along these lines. One of the newer ideas to help advance this goal is genetic programming(GP).

Over other ideas in machine learning, GP shares the same advantages as other EAs. These being that very little theory is needed. All that is really required is to develop an appropriate structure for the individuals that represent a solution and methods for these individuals to share information and/or develop new information.

Unfortunately, the advantages are also disadvantages. The individuals must be structured in such a way that even if they are modified in small ways, they will still operate. Therefore, simple solutions such as using a character string filled with C code or using a binary string containing machine code are inappropriate because the chances are too great that a small change will cause wildly different results or even no result at all.

The most common solution to these difficulties is to use expression trees. An expression tree is a graph representation of expression that removes all question of order of operations and so on. An expression tree for 3 x

2⋅ 5 x⋅+

1x 5+

− ,

might look like:

Page 8: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

III. Design & Implementation Details

The original plan for this project was to implement all details and then run it to find a solution. However, a test run while the program was partially implemented gave surprising results and therefore the implementation has been divided into a Stage 1 and Stage 2.

A. Individuals

Each individual will be an expression tree as described above. The non-terminals of these trees must be selected so as to represent the common operations necessary to generate random numbers. In his work [1], Koza chose these terminals to be:

F={+,-,*,/,%} (where / and % return 1 if the divisor is 0)With these operations, clearly, we can only create modifications of the linear congruential randomizer. Thus, an exclusive-OR operation was added to allow shift-type randomizers as well(the shift operation is taken care of elsewhere). So, in this project, the non-terminals are:

F={+,-,*,/,%,^}Finally, when a random tree is created, each non-terminal is equally likely to appear. Also, these operations are defined such that whenever an operation would overflow an unsigned 32-bit number, the high bits are dropped.

For the terminals, Koza initially used:T={J,N} (where J is the input value; N is a number 0-3, uniform)

For this experiment, an additional terminal was added. In combination with * or /, a power of two can simulate a shift left or right. So the terminal list was updated to:

T={J,N,2**i} (where i=1-31 equally likely)When a random tree is created, each type of terminal is equally likely to appear.

7

Page 9: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

B. Evaluation - Fitness

The seed used in evaluation is the same for every individual and chosen randomly every generation.

From the seed, every individual is used to create a sequence of 16381 numbers. Treating this sequence as a binary string, the 5-,6-,7- and 8-bit entropies are calculated, summed and divided by 26 to give the fitness value of the individual for this run. Note that Koza used 1-bit through 7-bit entropies, but this is redundant. To see this, consider that since 2 and 3 can be divided into 6, then three 2-bit sequences and two 3-bit sequences will map into one 6-bit sequence. Therefore, the occurrences of the 2-bit sequences will be related to the occurrences of the 6-bit sequences. In fact, as the 6-bit entropy approaches 6, the 2-bit entropy must approach 2 and likewise on the 3-bit entropy. Therefore, we need only check higher-bit entropies.

In Stage 1, these fitness values are used directly. While this is not strictly correct, since fitness is a dynamic value(the seed changes), the method still seemed to work.

In Stage 2, the entropy values are averaged over several generations and normalized. Then frequency and gap tests are performed. The χ-Sq percentile p is computed and run through following formula:

Fp 0.5−( )

2

0.25=

This makes the ideal value of p=0.5 have the highest fitness value. These three values are summed to obtain the final fitness value.

C. Crossover

Stage 1 does not use crossover.

Stage 2 does. Crossover is performed by swapping subtrees(regardless of size) between the parents. The parents are selected uniformly. While this selection will cause slower convergence, it will help to stave off stagnation.

8

Page 10: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

D. Mutation

Mutation is performed by replacing a subtree with a randomly generated tree. The depth of the random tree is set to be roughly equal to the size of the smallest branch of the replaced subtree.

Stage 1 selects an individual uniformly of the top half of the population to be mutated.

Stage 2 peforms mutation on the crossover children a specified percentage of the time.

E. Competition

In Stage 1, the bottom 50% by fitness of the population is replaced by the newly created children.

In Stage 2, the program was modified to be nearly steady-state. It only creates two children per iteration, and these replace the bottom two mature members of the population. These children will take some specified number of iterations to mature, after which they can reproduce and be replaced.

9

Page 11: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

IV. Discussion of results

A. Evolved PRNGs

After roughly 4700 generations, the top individual from Stage 1 had a fitness of 0.999999 and had lived for 3626 generations. It had a depth of 8 with a total of 49 nodes. The simplified version, with 31 nodes, follows:

(+,(/,(N,133),(-,(%,(*,(J),(+,(N,2),(J))),(N,3)),(^,(J),(N,3)))), (-,(-,(+,(%,(N,1),(-,(J),(N,1))),(J)),(N,3288334336)), (-,(N,33562624),(%,(N,0),(/,(J),(N,3)))) ))

10

Page 12: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

For Stage 2, the program was run on 5 P4-2.0Ghz machines for roughly 10 hours. Of the 5 runs, the best candidate was (in simplified form):

(^,(N,1),(+,(+,(^,(J),(/,(/,(2,23),(*,(J),(+,(J),(N,525312)))),(-,(N,1),(J)))),(+,(*,(J),(2,6)),(N,1))),(*,(^,(+,(^,(J),(%,(N,0),(^,(2,7),(J)))),(+,(*,(J),(2,6)),(N,1))),(/,(2,13),(J))),(2,22))))

This candidate did well in both the frequency and gap tests. It also had a high value in the entropy test, but not the highest. Also, this candidate was one of the smaller solutions, having 41 nodes in the simplified version(originally 61).

11

Page 13: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

B. Performance

First we will compare the results of these two PRNGs plus two commonly used PRNGs using the entropy test.

Entropy TestStd. DevAvg. EntropyPRNG

5.52E-0725.999974Stage 10.034835225.108252Stage 2

8.43E-0525.932738R2507.79E-0525.932582glibc rand()0.00000026.000000Ideal

As can be seen, the Stage 1 randomizer compares quite favorably with the commercial randomizers here. This makes sense since it was bred specifically to achieve high bit entropy. The Stage 2 randomizer is the bottom of the pack here in two ways. First, its average entropy is somewhat lower, though not enough to make it a poor choice. Second, since the std. deviation is higher, it seems that the randomizer is more sensitive to the choice of seed value.

Second, we will apply the frequency test by dividing each PRNG's output region in 512 sequential, equally sized regions.

Frequency TestChi-Sq. PercentileChi-Sq StatisticPRNG

0.0000%1.43748Stage 147.7160%509.50400Stage 278.8859%537.42400R25023.3957%488.47300glibc rand()50.0000%511.33349Ideal

Here we see the Stage 2 randomizer start to shine. It approaches the ideal value for the frequency test. This means that the randomizer is neither too non-uniform(0%) or unnaturally uniform(100%). As you can see the other randomizers don't do quite so well. In fact, the Stage 1 result is very poor.

12

Page 14: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

Last, we will the gap test to measure the performance. Specifically, we will be performing the "gap above the mean" test.

Gap Test (above the mean)Chi-Sq. PercentileChi-Sq StatisticPRNG

100.0000%3160280.00000Stage 133.1510%7.58965Stage 2

100.0000%173.47400R25029.7294%7.23900glibc rand()50.0000%9.34182Ideal

Here we see again that the Stage 1 randomizer does quite poorly. Surprisingly, R250 also does poorly on this test. Even the glibc rand() function doesn't do so well. Again the Stage 2 randomizer performs best, albeit by a slight margin.

V. Conclusion

As shown, it is possible and even practical to breed a computer program to generate random numbers. In fact the results are good enough that with a little more testing, the Stage 2 randomizer might be good enough for production use.

Also, I have confirmed Koza's result that breeding purely on the basis of bit entropy is not sufficient to produce a natural-seeming randomizer. Adding two additional tests to the fitness function produced a randomizer that can compete with commercial randomizers. I attribute this to the fact that the three tests used conflict somewhat. That is, I believe it is not possible to achieve the ideal values on all three tests. This puts more pressure on the EA as it must give up one thing to increase something else and I think this causes better solutions to occur.

13

Page 15: Evolving More Random Number Generators Using Genetic ...web.mst.edu/~tauritzd/courses/ec/fs2002/project/Barker.pdf · Evolving More Random Number Generators Using Genetic Programming

VI. Bibliography

[1] Koza, John R., Evolving a Computer Program to Generate Random Numbers Using the Genetic Programming Paradigm, Proceedings of the Fourth International Conference on Genetic Algorithms, Morgan Kaufmann Publishers, Inc., pages 37-44, 1991. http://citeseer.nj.nec.com/john91evolving.html

[2] Knuth, D. E., The Art of Computer Programming, Volume 2, Second Edition, Addison-Wesley, pages 9-114, Reading, MA, 1981.

[3] Koza, John R., Genetically Breeding Populations of Computer Programs to Solve Problems in Artificial Intelligence, Proceedings of the Second International Conference on Tools for AI. Washington, November, 1990, IEEE Computer Society Press, Los Alamitos, CA 1990.http://citeseer.nj.nec.com/koza90genetically.html

[4] Kinnear, Kenneth E. Jr., Evolving a sort: Lessons in genetic programming. Proceedings of the 1993 International Conference on Neural Networks, volume 2. IEEE Press, 1993.http://citeseer.nj.nec.com/kinnear93evolving.html

[5] Kinnear, Kenneth E. Jr., Generality and Difficulty in Genetic Programming: Evolving a Sort, Proceedings of the Fifth International Conference on Genetic Algorithms, Morgan Kaufmann Publishers, pages 287-294, Inc., 1993.http://citeseer.nj.nec.com/kinnear93generality.html

14