chapter 14 chapter 14 s imulation - b ased o ptimization i : r egeneration, c ommon r andom n...
TRANSCRIPT
CHAPTER 14CHAPTER 14 SIMULATION-BASED OPTIMIZATION I: REGENERATION, COMMON RANDOM NUMBERS, AND RELATED METHODS
•Organization of chapter in ISSO–Background
•Simulation-based optimization vs. model building
–Regenerative processes•Special structure for loss estimation and optimization
–FDSA and SPSA in simulation-based optimization
–Improved convergence through common random numbers
–Discrete optimization via statistical selection
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall
14-2
Background: Simulation-Based OptimizationBackground: Simulation-Based Optimization
• Optimization arises in two ways in simulation:
A.A. Building simulation model (parameter estimation)B.B. Using simulation for optimization of real system given
that problem A has been solved
• Focus here is problem BB• Fundamental goal is to optimize design vector in real real
systemsystem; simulation is proxy in optimization process• Loss function to be minimized L() represents averageaverage
system performance at given ; simulation runs produce noisy (approximate) value of L()
• Appropriate stochastic optimization method yields “intelligent” trial-and-error in choice of how to run simulation to find best
14-3
Background (cont’d)Background (cont’d)• Many modern processes are studied by Monte Carlo
simulation (manufacturing, defense, epidemiological, transportation, etc.)
• Loss functions for such systems typically have form
where Q(•) represents a function describing output of process based on Monte Carlo random effects in V
– Simulation produces sample replications of Q(,V) (typically one simulation produces one value of Q(•))
– Examples of Q(•) might be defective products in manufacturing process, accuracy of weapon system, disease incidence in particular population, cumulative vehicle wait time at traffic signals, etc.
L = E Q( ) [ ( , )] V
14-4
Background (cont’d)Background (cont’d)• Important assumption is that simulation is faithful
representation of true system • Recall that overall goal is to find that minimizes mean
value of Q(,V)– Equivalent to optimizing average performance of true
system
– Simulation-based optimization rests critically on simulation and true system being statistically equivalent
• As with earlier chapters, need optimization method to cope with noise in input information
• Noisy measurements of loss function and/or gradient of loss function
• Focus in this chapter is simulation-based optimization without direct (noisy or noise-free) gradient information
14-5
Comments on Gradient-Based and Comments on Gradient-Based and Gradient-Free MethodsGradient-Free Methods
• In complex simulations, L/ (for use in deterministic optimization such as steepest descent) or Q/ (for use in stochastic gradient search [Chap. 5]) often not available– “Automatic differentiation” techniques (e.g., Griewank and
Corliss, 1991) also usually infeasible due to software and storage requirements
• Optimize by using simulations to produce Q(,V) for varying and V
• Unlike Q/ (and E[Q(,V)]), Q(,V) is available in even the most complex simulations– Can use gradient-free optimization that allows for noisy loss
measurements (since Q(,V) E[Q(,V)] = L(), i.e., Q(,V) = L() + noise)
– Appropriate stochastic approximation methods (e.g., FDSA, SPSA, etc.) may be used based on measurements Q(,V)
14-6
Regenerative SystemsRegenerative Systems
• Common issue in simulation of dynamic systems is choice of amount of time to be represented
• Regeneration is useful for addressing issue• Regenerative systems have property of returning
periodically to some particular probabilistic state; system effectively starts anew with each period
• Queuing systems are common examples– Day-to-day traffic flow; inventory control; communications
networks; etc.
• Advantage is that regeneration periods may be considered i.i.d. random processes
• Typical loss has form:
E
LE
(cost/period)( )
(length of period)
14-7
Queuing System with Regeneration; Queuing System with Regeneration; Periods Begin with Arrivals 1,Periods Begin with Arrivals 1, 3,3, 4,4, 7,7, 11,11, 16 16
(Example 14.2 in (Example 14.2 in ISSOISSO))
14-8
Care Needed in Loss Estimators for Care Needed in Loss Estimators for Optimization of Regenerative SystemsOptimization of Regenerative Systems
• Optimization of commonly based on unbiasedunbiased estimators of L() and/or gradient
• Straightforward estimator of L() is
• Above estimator is biasedbiased in general (i.e., )– Biasedness follows from relationship for
positive random variable X
– not acceptablenot acceptable estimator of L() in general
• Special cases may eliminate or minimize bias (e.g., when length of period is deterministic; see Sect. 14.2 of ISSO)– For such special cases, is acceptableacceptable estimator for use
in optimization
LSample mean of cost/periodˆ( )
Sample mean of length of period
E L Lˆ( ) ( ) E X E X1 1 ( )
L̂( )
L̂( )
14-9
FDSA and SPSA in FDSA and SPSA in Simulation-Based OptimizationSimulation-Based Optimization
• Stochastic approximation provides ideal framework for carrying out simulation-based optimization
– Rigorous means for handling noisy loss information inherent in Monte CarloMonte Carlo simulation: y() = Q(,V) = L() + noise
– Most other optimization methods (GAs, nonlinear programming, etc) apply only on ad hoc basis
• “…FDSA, or some variant of it, remains the method of choice for the majority of practitioners” (Fu and Hu, 1997)
– No need to know “inner workings” of simulation, as in gradient-based methods such as IPA, LR/SF, etc.
• FDSA and SPSA-type methods much easier to use than gradient-based method as they only require simulation inputs/outputs
14-10
Common Random NumbersCommon Random Numbers• Common random numbers (CRNs) provide a way for
improving simulation-based optimization by reusing the Monte-Carlo-generated random variables
• CRNs based on the famous formula for two random variables X, Y:
var(X Y) = var(X) + var(Y) 2cov(X,Y)
• Maximizing the covariance minimizes the variance of the difference
• The aim of CRNs is to reduce variability of the gradient estimate
Improves convergence in algorithm
14-11
CRNs (cont’d)CRNs (cont’d)
• For SPSA, the gradient variability is largelydriven by the numerator
• Two effects contribute to variability:(i) difference due to perturbations (desirabledesirable)(ii) difference due to noise effects in measurements (undesirableundesirable)
• CRNs useful for reducing undesirable variability in (ii)
• Using CRNs maximizes covariance between two y() values in numerator
Minimizes variance of difference
k k k k k ky c y cˆ ˆ( ) ( )
k kc
14-12
CRNs (cont’d)CRNs (cont’d)• In simulation (vs. most real systems) some form of CRNs is
often feasible• The essence of CRN is to use same random numbers in
both and – Achieved by using same random number seed for both
simulations and synchronizing the random numbers• Optimal rate of convergence of iterate to (à la k–/2 ) is k–
1/2 (Kleinman et al., 1999); this rate is same as stochastic gradient-based method– Rate is improvement on optimal non-CRN rate of k–1/3
• Unfortunately, “pure CRN” may not be feasible in large-scale simulations due to violating synchronization requirement– e.g., if represents service rates in a queuing system,
difference between and may allow additional (stochastic) arrivals to be serviced in one case
ˆ( )k k ky c
ˆk k kc
ˆ( )k k ky c
ˆk k kc
14-13
Numerical Illustration (Example 14.8 in Numerical Illustration (Example 14.8 in ISSOISSO))• Simulation using exponentially distributed random
variables and loss function with p = dim() = 10
• Goal is to compare CRN and non-CRN
is minimizing value for L()
• Table below shows improved accuracy of solution under CRNs; plot on next slide compares rate of convergence
0
ˆ
ˆn
Total Iterations n
CRNs Non-CRNs
1000 0.02195 0.04103
10,000 0.00658 0.01845
100,000 0.00207 0.00819
14-14
Rates of Convergence for CRN and Non-CRN Rates of Convergence for CRN and Non-CRN (Example 14.9 in (Example 14.9 in ISSOISSO))
Non-CRN, =2/3
100 1000 10,000 100,0000
1.0
2.0
3.0
4.0
5.0
6.0
7.0
/2 ˆ( ) nn
n (log scale)
CRN, =1
Non-CRN, =1
MeanValues of
14-15
Partial CRNsPartial CRNs
• By using the same random number seed for and it is possible to achieve a partial CRN
• Some of the events in the simulations will be synchronized due to common seed
– Synchronization is likely to break down during course of simulation, especially for small k when ck is relatively large
• Asymptotic analysis produces convergence rate identical to pure CRN since synchronization occurs as ck 0
– Also require new seed for simulations at each iteration (common for both y(•) values) to ensure convergence tominL() = minE[Q(,V)])
• In partial CRN,practical finite sample rate of convergence for SPSA tends to be lower than in pure CRN setting
ˆ( )k k ky c ˆ( )k k ky c
14-16
Numerical Example: Partial CRNsNumerical Example: Partial CRNs(Kleinman et al., 1999; see p. 398 of (Kleinman et al., 1999; see p. 398 of ISSOISSO))
• A simulation using exponentially distributed random variables was conducted in Kleinman, et al. (1999) forp = 10
– Simulation designed so that it is possible to implement pure CRN (not available in most practical simulations)
• Purpose is to evaluate relative performance of non-CRN, partial CRN, and pure CRN
14-17
Numerical Example (cont’d)Numerical Example (cont’d)• Numerical Results for 100 replications of SPSA and FDSA
(no. of y(•) measurements in SPSA and FDSA are equal with total iterations of 10000 and 1000 respectively):
Non-CRN
Partial CRN
Pure CRN
0.0190
0.0071
0.0065
• Partial CRN offers significant improvement over non-CRN and SPSA outperforms FDSA (except in idealized pure CRN case)
0.0410
0.0110
0.0064
SPSA10000
0
ˆ
ˆ
FDSA1000
0
ˆ
ˆ
14-18
Indifference Zone Methods for Choosing Indifference Zone Methods for Choosing Best Option Best Option
• Consider use of simulation to determine the best of K possible options, represented 1 , 2 ,…, K
• Simulation produces noisy loss measurements yk(i )
– Other methods for discrete optimization (e.g., random search, simulated annealing, genetic algorithms, etc.) generally inappropriate
• Suppose analyst is willing to accept any i such that L(i) is in indifference zone [L(), L() + )
• Analyst can specify such that
P(correct selection) 1
whenever L(i ) L() for all i
• Can use independent sampling or common random numbers (steps for independent sampling on next slide)
14-19
Two-Stage Indifference Zone Selection withTwo-Stage Indifference Zone Selection with Independent Independent SamplingSampling
Step 0 (initialization) Step 0 (initialization) Choose , , and initial sample size n0.
• Step 1 Step 1 (first stage)(first stage) Run simulation n0 times at each i.
• Step 2 (variance estimation) Step 2 (variance estimation) Compute sample variance at each
i .
• Step 3 (sample sizes)Step 3 (sample sizes) Using above variance estimates and table
look-up, compute the total sample size ni at each i .
• Step 4 Step 4 (second stage)(second stage) Run simulation ni – n0 additional times at
each i .
• Step 5 (sample means)Step 5 (sample means) Compute sample means of simulation
outputs at each i over all ni runs.
• Step 6 (decision step)Step 6 (decision step) Select the i corresponding to the lowest
sample mean from step 5.
14-20
Two-Stage Indifference Zone Selection withTwo-Stage Indifference Zone Selection with CRN (Dependent) CRN (Dependent) SamplingSampling
Step 0 (initialization) Step 0 (initialization) Choose , , and initial sample size n0.
• Step 1 Step 1 (first stage)(first stage) Run simulation n0 times at each i. The The kkth th
simulation runs for the simulation runs for the ii are dependent. are dependent.
• Step 2 (variance estimation) Step 2 (variance estimation) Compute overall sample variance for
Kn0 runs.
• Step 3 (sample sizes)Step 3 (sample sizes) Using above variance estimate and table look-
up, compute total sample size n; n applies for all i .
• Step 4 Step 4 (second stage)(second stage) Run simulation n – n0 additional times at each
i .
• Step 5 (sample means)Step 5 (sample means) Compute sample means of simulation
outputs at each i over all n runs.
• Step 6 (decision step)Step 6 (decision step) Select the i corresponding to the lowest
sample mean from step 5.