random number generation - home - rice university

30
Dr. John Mellor-Crummey Department of Computer Science Rice University [email protected] Random Number Generation COMP 528 Lecture 21 5 April 2005

Upload: others

Post on 12-Sep-2021

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Random Number Generation - Home - Rice University

Dr. John Mellor-Crummey

Department of Computer ScienceRice University

[email protected]

Random Number Generation

COMP 528 Lecture 21 5 April 2005

Page 2: Random Number Generation - Home - Rice University

2

Topics for Today

Understand• Motivation• Desired properties of a good generator• Linear congruential generators

—multiplicative and mixed• Tausworthe generators• Combined generators• Seed selection• Myths about random number generation• What’s used today: MATLAB, R, Linux

Page 3: Random Number Generation - Home - Rice University

3

Why Random Number Generation?

• Simulation must generate random values for variables in aspecified random distribution—examples: normal, exponential, …

• How? Two steps—random number generation: generate a sequence of uniform FP

random numbers in [0,1]—random variate generation: transform a uniform random

sequence to produce a sequence with the desired distribution

Page 4: Random Number Generation - Home - Rice University

4

How Random Number Generators Work

• Most commonly use recurrence relation

recurrence is a function of last 1 (or a few numbers), e.g.

• Example:—For x0= 5, first 32 numbers are 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9,

14, 7, 4, 5, 10, 3, 0, 1, 6, 15, 12, 13, 2, 11, 8, 9, 14, 7, 4, 5—x’s are integers in [0,16]—dividing by 16, get random numbers in interval [0,1]

• Properties of pseudo-random number sequences—from seed value, can determine entire sequence—they pass statistical tests for randomness—reproducibility (often desirable)

!

xn = f (xn"1,xn"2,...)

!

xn

= (5xn"1 +1) mod 16

Page 5: Random Number Generation - Home - Rice University

5

Random Number Sequences

• Some generators do not repeat the initial part of a sequence

tail cycle length

period

Page 6: Random Number Generation - Home - Rice University

6

Desired Properties of a Good Generator

• Efficiently computable• Period should be large

—don’t want random numbers in a simulation to recycle• Successive values should be

—independent—uniformly distributed

Page 7: Random Number Generation - Home - Rice University

7

Linear-Congruential Generators

• 1951: D.H. Lehmer found that residues of successive powersof a number have good randomness

• Lehmer’s generator: multiplicative LCG• Modern generalization: mixed LCG

a,b,m > 0• Result: xn are integers in [0, m-1]• Popular because

—analyzed easily—certain guarantees can be made about their properties

!

xn

= an mod m; after computing x

n"1, xn = axn"1 mod m

multiplier modulus

!

xn

= (axn"1 + b) mod m

Page 8: Random Number Generation - Home - Rice University

8

Properties of LCGs

• Choice of a, b, m affects—period—autocorrelation

• Observations about LCGs—period can never be more than m ⇒ modulus m should be large—m = 2k yields efficient implementation by truncation—if b is non-zero, obtain period of m iff

– m & b are relatively prime– every prime that is a factor of m is also a factor of a - 1– if m is a multiple of 4, a - 1 must be too– all of these conditions are met if

• m = 2k, for some integer k• a = 4c + 1, for some integer c• b is an odd integer

• Full-period generator = one with period m—not all are equally good—lower autocorrelation between adjacent elements = better!

xn

= (axn"1 + b) mod m

Page 9: Random Number Generation - Home - Rice University

9

Example: Two Candidate LCGs

Which is better?

• Both must be full period generators—m = 2k, for some integer k—a = 4c + 1, for some integer c—b is an odd integer

!

xn

= ((234

+1)xn"1 +1) mod 2

35

xn

= ((218

+1)xn"1 +1) mod 2

35

!

xn

= (axn"1 + b) mod m

Page 10: Random Number Generation - Home - Rice University

10

Multiplicative LCGs

• More efficient than mixed LCGs: no addition• Two classes: m = 2k, m ≠ 2k

Page 11: Random Number Generation - Home - Rice University

11

Multiplicative LCG with m = 2k

• Most efficient LCG: mod = truncation• Not full-period: maximum possible period for m = 2k is 2k-2

—only possible if multipler a = 8i±3 and x0 is odd—consider

• If 2k-2 period suffices, may use multiplicative LCG for efficiency

!

xn

= 5xn"1 mod 25 (lcg_m2k_good)

xn

= 7xn"1 mod 25 (lcg_m2k_bad)

!

xn

= an

mod 2k

Page 12: Random Number Generation - Home - Rice University

12

Multiplicative LCG with m ≠ 2k

• Avoid small period of LCG when m = 2k: use prime modulus• Full period generator with proper choice of a

—when a is primitive root of m– i.e. an mod m ≠ 1 for n = 1, 2, …, m-2

• Consider

• Observations—unlike mixed LCG, xn can never be 0 when m is prime

!

xn

= anmod m, m " 2

k

!

xn

= 3xn"1 mod 31 (lcg_mprime_good)

xn

= 5xn"1 mod 31 (lcg_mprime_bad)

!

Note : 53mod 31 =125 mod 31 =1

Page 13: Random Number Generation - Home - Rice University

13

Examining Bits of a Multiplicative LCG

testgenerator(@r1,1,20) n decimal binary--- ---------- ----------------- 1 25173 01100010 01010101 2 12345 00110000 00111001 3 54509 11010100 11101101 4 27825 01101100 10110001 5 55493 11011000 11000101 6 25449 01100011 01101001 7 13277 00110011 11011101 8 53857 11010010 01100001 9 64565 11111100 00110101 10 1945 00000111 10011001 11 6093 00010111 11001101 12 24849 01100001 00010001 13 48293 10111100 10100101 14 52425 11001100 11001001 15 61629 11110000 10111101 16 18625 01001000 11000001 17 2581 00001010 00010101 18 25337 01100010 11111001 19 11949 00101110 10101101 20 47473 10111001 01110001

!

xn

= 25,173xn"1

mod 216

bit 1: always 1bit 2: always 0bit 3: cycle (10) of length 2bit 4: cycle (0110) of length 4

In general: kth bit follows cycle of length 2k-2, k ≥ 2

Typical of multiplicativeLCG with modulus 2k

Page 14: Random Number Generation - Home - Rice University

14

Examining Bits of a Mixed LCG

testgenerator(@r2,1,20) n decimal binary--- ---------- ----------------- 1 39022 10011000 01101110 2 61087 11101110 10011111 3 20196 01001110 11100100 4 45005 10101111 11001101 5 3882 00001111 00101010 6 21259 01010011 00001011 7 65216 11111110 11000000 8 19417 01001011 11011001 9 30502 01110111 00100110 10 20919 01010001 10110111 11 26076 01100101 11011100 12 16421 01000000 00100101 13 44130 10101100 01100010 14 63139 11110110 10100011 15 32824 10000000 00111000 16 14513 00111000 10110001 17 51934 11001010 11011110 18 36303 10001101 11001111 19 35284 10001001 11010100 20 8573 00100001 01111101

!

xn

= (25,173xn"1 +13,849)mod 2

16

bit 1: cycle (10) of length 2bit 2: cycle (1100) of length 4bit 3: cycle (11110000) of length 8

In general: kth bit follows cycle of length 2k

Typical of mixed LCG withmodulus 2k

Page 15: Random Number Generation - Home - Rice University

15

LCG Cautions

• Properties guaranteed only if—computations are exact: no roundoff

– use integer arithmetic without overflow• Low-order bits not very random, high-order bits better

—if one wants k bits && k < machine word length– better to choose high-order k bits than low-order k bits.

Page 16: Random Number Generation - Home - Rice University

16

Tausworthe Generators

• Significant interest in huge random numbers—cryptographic applications want many-bit random numbers—produce k-bit numbers by

– produce random sequence of bits– chunk bit stream into k-bit quantities

• 1965: Tausworthe generator

—uses last q bits of bit stream to compute next bit– autoregressive, order q: AR(q)

• AR(q) generator maximum period = 2q - 1!

bn = cq"1bn"1 # cq"2bn"2 # cq"3bn"3 # ...# c0bn"q

ci and bi are binary variables

# is the xor operation (mod 2 addition)

Page 17: Random Number Generation - Home - Rice University

17

Tausworthe Generator Notation

• Characteristic polynomial notation

• Most polynomials for Tausworthe generators are trinomials• Period depends on characteristic polynomial

—if period = 2q - 1, characteristic polynomial is primitive polynomial!

x7

+ x3

+1

bn+7

" bn+3

" bn

= 0, n = 0,1,2,...

bn+7

= bn+3

" bn, n = 0,1,2,...

bn

= bn#4

" bn#7

, n = 7,8,9,...

characteristic polynomial

Page 18: Random Number Generation - Home - Rice University

18

Implementing Tausworthe Generators

• Linear feedback shift registers

• Disadvantage of Tausworthe generators—while sequence is good overall, local behavior may not be

– known to perform negatively on runs up and down test—first-order serial correlation almost 0—suspected that some polynomials may give poor high-order corr.

bn bn-4bn-1 bn-7bn-6bn-5bn-2 bn-3 out!

x7

+ x3

+1

bn+7

" bn+3

" bn

= 0, n = 0,1,2,...

bn+7

= bn+3

" bn, n = 0,1,2,...

bn

= bn#4

" bn#7

, n = 7,8,9,...

Page 19: Random Number Generation - Home - Rice University

19

Generating k-bit Random Numbers

k-bit random numbers xn from binary sequence bn Generalized feedback shift register method (Lewis & Payne ‘73)

• s is carefully selected delay—s ≥ k: xn and xj have no bits in common for n ≠ j—s relatively prime to 2q - 1: guarantees full period for xn

• Advantage—xn can be generated very efficiently with wide-word shift and

exclusive or operations• Requires

—storing an array of seed numbers—careful initialization of seed array

!

xn

= 0. bnbn+sbn+2s

... bn+(k"1)s

Page 20: Random Number Generation - Home - Rice University

20

Extended Fibonacci Generators• Fibonacci sequence:• Fibonacci RNG:• Properties

—not very good randomness– high serial correlation

• Extended Fibonacci generator (Marsaglia 1983)

—state: ring buffer with 17 values—initialization

– save integers in 17 values (not all integers even)– initialize j=16,k=4 cursors for buffer

—generate– x = B[j] + B[k]– B[j] = x– j = j -1 mod 17; k = k -1 mod 17– return x

• Properties—passes most statistical tests—period = 2k(217-1) (much longer than LCGs)

!

xn

= xn"1

+ xn -2

!

xn

= (xn"1 + xn -2)modm

!

xn

= (xn"5 + xn -17)mod2

k

Page 21: Random Number Generation - Home - Rice University

21

Some Combined Generators

Can combine 2 or more generators to produce a better one

• Adding random numbers from 2 or more generators—if xn and yn are random sequences in [0,m-1], then

– wn= (xn + yn) mod m can be used as a random number—why do this?

– can increase period and randomness if two generators have different periods• Exclusive-or random numbers from 2 or more generators

—Santha & Vazirani (1984)– xor of 2 random n-bit streams generates a more random sequence

• Shuffle—use sequence a to pick which recent element in sequence b to return—Marsaglia & Bray (1964)

– keep 100 items of sequence b– use sequence a to select which to return next and replace

—claim: better k distributivity than LFSR methods—problem: not easy to skip long sequence for multi-stream simulations

Page 22: Random Number Generation - Home - Rice University

22

Seed Selection Issues

• Wrong combination of seed and RNG can hurt—especially if RNG is flawed

– e.g. seed might be RNG fixed point• Cases

—one stream needed– if RNG has full period, then any seed as good as another

—multiple streams needed– e.g. queue simulation requires

• interarrival time stream• service time stream

– requires special care!

Page 23: Random Number Generation - Home - Rice University

23

Seed Selection Guidelines I• Don’t use 0

—multiplicative LCGs and Tausworthe generators would stick at 0• Avoid even values

—seed should be odd for multiplicative LCG with m = 2k

—for full period generators, all non-zero values equally good• Don’t subdivide one stream

—don’t use a single stream for all random variables– might be a strong correlation between items in same stream

• Use non-overlapping streams—each stream requires separate seed

– don’t use same seed for 2 or more streams!—if seeds are bad, streams will overlap and not be independent—right way: select seeds so streams don’t overlap at all

– example: need 3 streams of 20,000 numbers• pick u0 as seed for first stream• pick u20,000 as seed for second stream• pick u40,000 as seed for third stream

Page 24: Random Number Generation - Home - Rice University

24

Seed Selection Guidelines II

• Reuse seeds in successive replications—if simulation experiment is replicated several times

– can use seeds from end of previous replication in next one• Don’t use random seeds

—simulation can’t be reproduced—impossible to guarantee multiple streams won’t overlap

Page 25: Random Number Generation - Home - Rice University

25

Myths I

• A complex set of operations leads to random results—complicated code ≠ random sequence of numbers that will pass

tests of uniformity and independence• A single test of goodness suffices

—sequence 0, 1, …, m-1– not random but passes chi-square test– will fail run test

—use as many tests as possible• Pseudo-random numbers are unpredictable

—e.g. can identify LCG parameters with a few numbers and predict—LCG unsuitable for cryptographic applications where

unpredictability is desired• Some seeds are better than others

—e.g. odd vs. even, avoid particular seeds, etc.

—may be true for some generators, but these should be avoided!—any non-zero seed should produce equally valid results

!

xn

= (9806xn"1 +1)mod(217

"1) 37,911 is a fixed point!

Page 26: Random Number Generation - Home - Rice University

26

Myths II

• Accurate implementation is not important—period and randomness are guaranteed only if formula is

implemented without overflow or truncation– overflows and truncations can

• change the path of a generator• reduce the period

• Bits of successive words are equally-randomly distributed—if an algorithm produces a k-bit wide number, randomness is

only guaranteed when all k bits are used—unless specified otherwise, assume any particular bit position

(or sequence thereof) will not be equally random

Page 27: Random Number Generation - Home - Rice University

27

What’s Used Today: MATLAB

• rand function—lagged Fibonacci generator—seed– cache of 32 floating point numbers– combined with a shift register random integer generator

• core: j ^= (j<<13); j ^= (j>>17); j ^= (j<<5)

—properties:– period: > 21492

– fairly sure all FP numbers in [e/2,1-e/2] are generated• e = 2-52

Page 28: Random Number Generation - Home - Rice University

28

What’s Used Today: R

• “Mersenne-Twister” (Matsumoto and Nishimura,1998) [default]—twisted GFSR based on Mersenne primes—seed: 623-dimensional set of 32-bit integers + a cursor—period: 219937 - 1—equi-distribution in 623 consecutive dimensions (whole period)—[note: variant of MT for independent parallel streams exists too]

• “Knuth-TAOCP” (Knuth, 1997)—GFSR using lagged Fibonacci sequences with subtraction

– X[j] = (X[j-100] - X[j-37]) mod 230—seed: the set of the 100 last numbers + cyclic shift of buffer—period: about 2^129.

• “Knuth-TAOCP-2002”— initialization of GFSR from seed was altered

Page 29: Random Number Generation - Home - Rice University

29

What’s Used Today: R (continued)

• “Wichmann-Hill”—seed: integer vector of length 3

– seed[i] is in 1:(p[i] - 1)– p is the length 3 vector of primes, p = (30269, 30307, 30323)

—cycle length: 6.9536e12 = prod(p-1)/4—reference: Applied Statistics (1984) 33, 123

• “Marsaglia-Multicarry” multiply-with-carry RNG (Marsaglia)—seed: two integers, all values allowed—period: > 260

—has passed all tests (according to Marsaglia)• “Super-Duper” (Marsaglia)

—doesn’t pass the MTUPLE test of the Diehard battery—period: about 4.6*10^18 for most initial seeds—seed: 2 integers (first: all values allowed; second: odd value).

– default seeds are the Tausworthe and congruence long integers

Page 30: Random Number Generation - Home - Rice University

30

What’s Used Today: Linux

• random function—non-linear additive feedback-based generator—state: 8, 32, 64, 128, or 256 bytes—all bits considered random

• rand function—bottom 12 bits go through cyclic pattern—higher-order bits more random