chapter 14 chapter 14 s imulation - b ased o ptimization i : r egeneration, c ommon r andom n...

CHAPTER 14CHAPTER 14 SIMULATION-BASED OPTIMIZATION I: REGENERATION, COMMON RANDOM NUMBERS, AND RELATED METHODS

•Organization of chapter in ISSO–Background

•Simulation-based optimization vs. model building

–Regenerative processes•Special structure for loss estimation and optimization

–FDSA and SPSA in simulation-based optimization

–Improved convergence through common random numbers

–Discrete optimization via statistical selection

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

14-2

Background: Simulation-Based OptimizationBackground: Simulation-Based Optimization

• Optimization arises in two ways in simulation:

A.A. Building simulation model (parameter estimation)B.B. Using simulation for optimization of real system given

that problem A has been solved

• Focus here is problem BB• Fundamental goal is to optimize design vector in real real

systemsystem; simulation is proxy in optimization process• Loss function to be minimized L() represents averageaverage

system performance at given ; simulation runs produce noisy (approximate) value of L()

• Appropriate stochastic optimization method yields “intelligent” trial-and-error in choice of how to run simulation to find best

14-3

Background (cont’d)Background (cont’d)• Many modern processes are studied by Monte Carlo

simulation (manufacturing, defense, epidemiological, transportation, etc.)

• Loss functions for such systems typically have form

where Q(•) represents a function describing output of process based on Monte Carlo random effects in V

– Simulation produces sample replications of Q(,V) (typically one simulation produces one value of Q(•))

– Examples of Q(•) might be defective products in manufacturing process, accuracy of weapon system, disease incidence in particular population, cumulative vehicle wait time at traffic signals, etc.

L = E Q( ) [ ( , )] V

14-4

Background (cont’d)Background (cont’d)• Important assumption is that simulation is faithful

representation of true system • Recall that overall goal is to find that minimizes mean

value of Q(,V)– Equivalent to optimizing average performance of true

system

– Simulation-based optimization rests critically on simulation and true system being statistically equivalent

• As with earlier chapters, need optimization method to cope with noise in input information

• Noisy measurements of loss function and/or gradient of loss function

• Focus in this chapter is simulation-based optimization without direct (noisy or noise-free) gradient information

14-5

Comments on Gradient-Based and Comments on Gradient-Based and Gradient-Free MethodsGradient-Free Methods

• In complex simulations, L/ (for use in deterministic optimization such as steepest descent) or Q/ (for use in stochastic gradient search [Chap. 5]) often not available– “Automatic differentiation” techniques (e.g., Griewank and

Corliss, 1991) also usually infeasible due to software and storage requirements

• Optimize by using simulations to produce Q(,V) for varying and V

• Unlike Q/ (and E[Q(,V)]), Q(,V) is available in even the most complex simulations– Can use gradient-free optimization that allows for noisy loss

measurements (since Q(,V) E[Q(,V)] = L(), i.e., Q(,V) = L() + noise)

– Appropriate stochastic approximation methods (e.g., FDSA, SPSA, etc.) may be used based on measurements Q(,V)

14-6

Regenerative SystemsRegenerative Systems

• Common issue in simulation of dynamic systems is choice of amount of time to be represented

• Regeneration is useful for addressing issue• Regenerative systems have property of returning

periodically to some particular probabilistic state; system effectively starts anew with each period

• Queuing systems are common examples– Day-to-day traffic flow; inventory control; communications

networks; etc.

• Advantage is that regeneration periods may be considered i.i.d. random processes

• Typical loss has form:

E

LE

(cost/period)( )

(length of period)

14-7

Queuing System with Regeneration; Queuing System with Regeneration; Periods Begin with Arrivals 1,Periods Begin with Arrivals 1, 3,3, 4,4, 7,7, 11,11, 16 16

(Example 14.2 in (Example 14.2 in ISSOISSO))

14-8

Care Needed in Loss Estimators for Care Needed in Loss Estimators for Optimization of Regenerative SystemsOptimization of Regenerative Systems

• Optimization of commonly based on unbiasedunbiased estimators of L() and/or gradient

• Straightforward estimator of L() is

• Above estimator is biasedbiased in general (i.e., )– Biasedness follows from relationship for

positive random variable X

– not acceptablenot acceptable estimator of L() in general

• Special cases may eliminate or minimize bias (e.g., when length of period is deterministic; see Sect. 14.2 of ISSO)– For such special cases, is acceptableacceptable estimator for use

in optimization

LSample mean of cost/periodˆ( )

Sample mean of length of period

E L Lˆ( ) ( ) E X E X1 1 ( )

L̂( )

L̂( )

14-9

FDSA and SPSA in FDSA and SPSA in Simulation-Based OptimizationSimulation-Based Optimization

• Stochastic approximation provides ideal framework for carrying out simulation-based optimization

– Rigorous means for handling noisy loss information inherent in Monte CarloMonte Carlo simulation: y() = Q(,V) = L() + noise

– Most other optimization methods (GAs, nonlinear programming, etc) apply only on ad hoc basis

• “…FDSA, or some variant of it, remains the method of choice for the majority of practitioners” (Fu and Hu, 1997)

– No need to know “inner workings” of simulation, as in gradient-based methods such as IPA, LR/SF, etc.

• FDSA and SPSA-type methods much easier to use than gradient-based method as they only require simulation inputs/outputs

14-10

Common Random NumbersCommon Random Numbers• Common random numbers (CRNs) provide a way for

improving simulation-based optimization by reusing the Monte-Carlo-generated random variables

• CRNs based on the famous formula for two random variables X, Y:

var(X Y) = var(X) + var(Y) 2cov(X,Y)

• Maximizing the covariance minimizes the variance of the difference

• The aim of CRNs is to reduce variability of the gradient estimate

Improves convergence in algorithm

14-11

CRNs (cont’d)CRNs (cont’d)

• For SPSA, the gradient variability is largelydriven by the numerator

• Two effects contribute to variability:(i) difference due to perturbations (desirabledesirable)(ii) difference due to noise effects in measurements (undesirableundesirable)

• CRNs useful for reducing undesirable variability in (ii)

• Using CRNs maximizes covariance between two y() values in numerator

Minimizes variance of difference

k k k k k ky c y cˆ ˆ( ) ( )

k kc

14-12

CRNs (cont’d)CRNs (cont’d)• In simulation (vs. most real systems) some form of CRNs is

often feasible• The essence of CRN is to use same random numbers in

both and – Achieved by using same random number seed for both

simulations and synchronizing the random numbers• Optimal rate of convergence of iterate to (à la k–/2 ) is k–

1/2 (Kleinman et al., 1999); this rate is same as stochastic gradient-based method– Rate is improvement on optimal non-CRN rate of k–1/3

• Unfortunately, “pure CRN” may not be feasible in large-scale simulations due to violating synchronization requirement– e.g., if represents service rates in a queuing system,

difference between and may allow additional (stochastic) arrivals to be serviced in one case

ˆ( )k k ky c

ˆk k kc

ˆ( )k k ky c

ˆk k kc

14-13

Numerical Illustration (Example 14.8 in Numerical Illustration (Example 14.8 in ISSOISSO))• Simulation using exponentially distributed random

variables and loss function with p = dim() = 10

• Goal is to compare CRN and non-CRN

is minimizing value for L()

• Table below shows improved accuracy of solution under CRNs; plot on next slide compares rate of convergence

0

ˆ

ˆn

Total Iterations n

CRNs Non-CRNs

1000 0.02195 0.04103

10,000 0.00658 0.01845

100,000 0.00207 0.00819

14-14

Rates of Convergence for CRN and Non-CRN Rates of Convergence for CRN and Non-CRN (Example 14.9 in (Example 14.9 in ISSOISSO))

Non-CRN, =2/3

100 1000 10,000 100,0000

1.0

2.0

3.0

4.0

5.0

6.0

7.0

/2 ˆ( ) nn

n (log scale)

CRN, =1

Non-CRN, =1

MeanValues of

14-15

Partial CRNsPartial CRNs

• By using the same random number seed for and it is possible to achieve a partial CRN

• Some of the events in the simulations will be synchronized due to common seed

– Synchronization is likely to break down during course of simulation, especially for small k when ck is relatively large

• Asymptotic analysis produces convergence rate identical to pure CRN since synchronization occurs as ck 0

– Also require new seed for simulations at each iteration (common for both y(•) values) to ensure convergence tominL() = minE[Q(,V)])

• In partial CRN,practical finite sample rate of convergence for SPSA tends to be lower than in pure CRN setting

ˆ( )k k ky c ˆ( )k k ky c

14-16

Numerical Example: Partial CRNsNumerical Example: Partial CRNs(Kleinman et al., 1999; see p. 398 of (Kleinman et al., 1999; see p. 398 of ISSOISSO))

• A simulation using exponentially distributed random variables was conducted in Kleinman, et al. (1999) forp = 10

– Simulation designed so that it is possible to implement pure CRN (not available in most practical simulations)

• Purpose is to evaluate relative performance of non-CRN, partial CRN, and pure CRN

14-17

Numerical Example (cont’d)Numerical Example (cont’d)• Numerical Results for 100 replications of SPSA and FDSA

(no. of y(•) measurements in SPSA and FDSA are equal with total iterations of 10000 and 1000 respectively):

Non-CRN

Partial CRN

Pure CRN

0.0190

0.0071

0.0065

• Partial CRN offers significant improvement over non-CRN and SPSA outperforms FDSA (except in idealized pure CRN case)

0.0410

0.0110

0.0064

SPSA10000

0

ˆ

ˆ

FDSA1000

0

ˆ

ˆ

14-18

Indifference Zone Methods for Choosing Indifference Zone Methods for Choosing Best Option Best Option

• Consider use of simulation to determine the best of K possible options, represented 1 , 2 ,…, K

• Simulation produces noisy loss measurements yk(i )

– Other methods for discrete optimization (e.g., random search, simulated annealing, genetic algorithms, etc.) generally inappropriate

• Suppose analyst is willing to accept any i such that L(i) is in indifference zone [L(), L() + )

• Analyst can specify such that

P(correct selection) 1

whenever L(i ) L() for all i

• Can use independent sampling or common random numbers (steps for independent sampling on next slide)

14-19

Two-Stage Indifference Zone Selection withTwo-Stage Indifference Zone Selection with Independent Independent SamplingSampling

Step 0 (initialization) Step 0 (initialization) Choose , , and initial sample size n0.

• Step 1 Step 1 (first stage)(first stage) Run simulation n0 times at each i.

• Step 2 (variance estimation) Step 2 (variance estimation) Compute sample variance at each

i .

• Step 3 (sample sizes)Step 3 (sample sizes) Using above variance estimates and table

look-up, compute the total sample size ni at each i .

• Step 4 Step 4 (second stage)(second stage) Run simulation ni – n0 additional times at

each i .

• Step 5 (sample means)Step 5 (sample means) Compute sample means of simulation

outputs at each i over all ni runs.

• Step 6 (decision step)Step 6 (decision step) Select the i corresponding to the lowest

sample mean from step 5.

14-20

Two-Stage Indifference Zone Selection withTwo-Stage Indifference Zone Selection with CRN (Dependent) CRN (Dependent) SamplingSampling

Step 0 (initialization) Step 0 (initialization) Choose , , and initial sample size n0.

• Step 1 Step 1 (first stage)(first stage) Run simulation n0 times at each i. The The kkth th

simulation runs for the simulation runs for the ii are dependent. are dependent.

• Step 2 (variance estimation) Step 2 (variance estimation) Compute overall sample variance for

Kn0 runs.

• Step 3 (sample sizes)Step 3 (sample sizes) Using above variance estimate and table look-

up, compute total sample size n; n applies for all i .

• Step 4 Step 4 (second stage)(second stage) Run simulation n – n0 additional times at each

i .

• Step 5 (sample means)Step 5 (sample means) Compute sample means of simulation

outputs at each i over all n runs.

• Step 6 (decision step)Step 6 (decision step) Select the i corresponding to the lowest

sample mean from step 5.

chapter 14 chapter 14 s imulation - b ased o ptimization i : r egeneration, c ommon r andom n...

Documents