your c compiler (as all compilers, software for …...intro 2 simu 1 your c compiler (as all...
TRANSCRIPT
G. Rubino INTRO 2 SIMU 0
INTRO 2 SIMU 1
Your C compiler (as all compilers, software for numerical computations, etc.) allows to call a built-in function providing pseudo-random numbers behaving as the realizations of an Uniform r.v. on [0,1] (for instance, on Unix, there is the function double drand48() providing such a service).
More specifically, each call to such a function BEHAVES as sampling a r.v. uniformly
distributed on [0,1], and n calls BEHAVE as sampling n r.v. Uniform on [0,1] and
INDEPENDENT OF EACH OTHER. In other words, n calls to this function return n real numbers behaving
as the realization of a sequence of n i.i.d. r.v. Uniform on [0,1].
G. Rubino
INTRO 2 SIMU 2
Actually, drand48() is completely deterministic. Moreover, there is another function in the library, whose
prototype is void srand48(long), such that the sequence srand48(s); u1 = drand48(); … uN = drand48();!
always produces the same values of the variables u1, …, uN. The parameter long s is called “the seed” of the sequence;
using the same seed we get the same sequence. This property is extremely useful in simulation, as we will
see. Of course, if the seed changes, the sequence changes as well
(see below).
G. Rubino
INTRO 2 SIMU 3
Mathematically, we know today what is randomness. This has been the work of Per Martin-Löf, Gregory Chaitin,
Andrei Kolmogorov, Ray Solomonoff, in the 60s. One of the consequences is that, by definition, randomness is
not programmable. The only we can do is to mimic randomness. Pseudo-random
number generators mimic randomness according to different distributions.
That is also all we need to simulate systems.
G. Rubino
INTRO 2 SIMU 4
Actually, the method used to implement drand48() and many other similar tools is the following: each call to drand48() makes the computer evaluate a new term in a
sequence of the form xk+1 = a xk + b mod c
where a, b, c are integers, c >> 1; the function then returns x/c; the seed of the sequence is the initial value x0 which is also an integer
(x0 is given to the computer by means of the srand48() function); this explains why the sequence is deterministic and why it repeats
itself is the seed used for two different sequences of calls is the same;
an appropriate choice of parameters a, b, c makes the sequence “very hard to distinguish” from a “really random one”.
G. Rubino
INTRO 2 SIMU 5
Consider the macros #define UNIF() drand48()!
#define INIT_UNIF(s) srand48(s)!
#define DIE() (ceiling(6*UNIF()) The first two macros are just renaming. Function ceiling(x) provides the smallest integer greater
than or equal to x. For instance, ceiling(3.2) = 4, ceiling(11) = 11, ceiling(-5.22) = -5.
Observe that 6*UNIF() implements a r.v. Uniform on [0,6]. EXERCICE: show that DIE() implements an Uniform r.v. taking values on the set {1,2,3,4,5,6} (that is, a “perfect (electronic) die”). Hint: observe that Pr(“obtaining a 0”) = 0.
G. Rubino
INTRO 2 SIMU 6
If everything is correctly done (our macros, the drand48() function, …) then if we call DIE() a large number N of times, we will observe that, say, number 5 appears approximately N/6 times (and the same for any of the possible outcomes 1, 2,…).
Observe also that if we do the previous experiment several times (say, 4 times), using different seeds, and we observe the first, say, 10 values of each of the 4 sequences, we will probably observe very different numbers. The same will happen if we observe the 10 values after position 100000 (< N) in each of these sequences.
Regularities appear only when large number of events are considered, and they appear under specific forms (such as averages, …).
G. Rubino
INTRO 2 SIMU 7
Consider the problem of simulating a coin tossing. We code 1 the outcome “heads” and 0 the outcome “tails”. Assume that the coin is generally biased, and that it has an
associated parameter p ∈ [0,1] such that heads appears with probability p.
A simple way of implementing such a generic coin on a computer is using the macro
#define COIN(p) (UNIF() < p)!
The coin is “fair” if p = 1/2.
G. Rubino
INTRO 2 SIMU 8
Consider the following question: which is the probability p of observing 10 “heads” and 10 “tails” when (“perfectly”) tossing 20 times a fair coin?
Perfectly tossing means that the outcome of each of the 20 samples can be considered independent of each other.
We know how to solve this using elementary probability: the answer is p = 20!/( (10!)2 220 ) ≈ 0.1762.
Assume (absurd) that you don’t know how to obtain the previous expression. You can then run the following code to get an idea about p:
nbOfSuccesses = 0; for (n = 1; n <= N; n++) { // do N times the experience! nbOfHeads = 0; ! for (m = 1; m <= 20; m++) // toss the coin 20 times ! if (COIN(0.5) == 1) nbOfHeads++;! if (nbOfHeads == 10) nbOfSuccesses++;! }! p = (double) nbOfSuccesses/N; // our estimation of p
G. Rubino
INTRO 2 SIMU 9
We must of course use a “large” value of N. QESTION 1: how large must be N? QESTION 2: assume N = 1000; after running the code, can we
say anything about the “quality” of the result? QESTION 3: can we say anything about the value of N that is
necessary to get some pre-specified accuracy in the result? and how must be formalized this idea of accuracy?
The answers to these questions will be the object of the Output Analysis part of the course.
G. Rubino
Recall previous example about the typical output of a simulation analysis of loss rates: interval confidence must be calculated
In the previous probability question, for a 95%-confidence interval, we can write:
nbOfSuccesses = 0; for (n = 1; n <= N; n++) { // do N times the experience! nbOfHeads = 0; ! for (m = 1; m <= 20; m++) // toss the coin 20 times ! if (COIN(0.5) == 1) nbOfHeads++;! if (nbOfHeads == 10) nbOfSuccesses++;! }! p = (double) nbOfSuccesses/N; // our estimation of p! p1 = p – 1.96*sqrt( p*(1-p) / (N-1) );! p2 = p + 1.96*sqrt( p*(1-p) / (N-1) ); // interval is (p1, p2)
G. Rubino, Oct. 2008 INTRO 2 SIMU 10
For a 99%-confidence interval, we can write: nbOfSuccesses = 0; for (n = 1; n <= N; n++) { // do N times the experience! nbOfHeads = 0; ! for (m = 1; m <= 20; m++) // toss the coin 20 times ! if (COIN(0.5) == 1) nbOfHeads++;! if (nbOfHeads == 10) nbOfSuccesses++;! }! p = (double) nbOfSuccesses/N; // our estimation of p! p1 = p – 2.58*sqrt( p*(1-p) / (N-1) );! p2 = p + 2.58*sqrt( p*(1-p) / (N-1) ); // interval is (p1, p2)
For a 99.9%-confidence interval, … p1 = p – 3.29*sqrt( p*(1-p) / (N-1) );! p2 = p + 3.29*sqrt( p*(1-p) / (N-1) ); // interval is (p1, p2)
G. Rubino, Oct. 2008 INTRO 2 SIMU 11
INTRO 2 SIMU 12
Of course, simulation can help to evaluate values that are difficult to obtain.
For instance, let X be the minimal number N of times we must toss a fair coin such that the square of the difference between the # of heads and the # of tails is > N. Assume we want to know p = Pr(X > 10). This looks difficult to evaluate analytically.
EXERCISE: design a C code able to evaluate p. EXERCISE: write a C program in order to evaluate the
probability that if we throw 10 dice, the square of the sum of the obtained numbers minus the sum of their squares is less than 200.
G. Rubino
N = # of tosses H = # of heads, T = # of tails, H + T = N Y = (H – T)2
N = 1 0 -> H=0, T=1, Y=1; Y > N? No; X = ∞ 1 -> H=1, T=0, Y=1; Y > N? No; X = ∞ Pr(X=1) = 0
N = 2 00 -> H=0, T=2, Y=4; Y > N? Yes 01 -> H=1, T=1, Y=0; Y > N? No 10 -> H=1, T=1, Y=0; Y > N? No 11 -> H=2, T=0, Y=4; Y > N? Yes Pr(X=2) = 2*(1/4) = 1/2
G. Rubino, Oct. 2008 INTRO 2 SIMU 13
N = 3 000 -> Y=9; Y > N? Yes 001 -> Y=1; Y > N? No 010 -> Y=1; Y > N? No 011 -> Y=1; Y > N? No 100 -> Y=1; Y > N? No 101 -> Y=1; Y > N? No 110 -> Y=1; Y > N? No 111 -> Y=9; Y > N? Yes Pr(X=3) = 2*(1/8) = ¼
Etc.
G. Rubino, Oct. 2008 INTRO 2 SIMU 14
INTRO 2 SIMU 15
“the random variable (r.v.) X has the Exponential distribution (or is Exponentially distributed) with mean M” means that M > 0, X ≥ 0 Pr(X > t) = exp(-t/M) density of X: exp(-t/M)/M E(X) = M Var(X) = M2; StdDev(X) = (Var(X))1/2 = M Cv(X) = StdDev(X)/E(X) = 1
the number 1/M is the parameter of the distribution of X; if we denote 1/M = α, then Pr(X > t) = exp(-αt), density = α exp(-αt), E(X) = 1/α, etc.
G. Rubino
INTRO 2 SIMU 16
Recall that “U is an Uniform r.v. on [0,1]” means that 0 ≤ U ≤ 1 the density of U is the function 1([0,1]) Pr(U ≤ u) = u, for any u in [0,1] We then have E(U) = 1/2, V(U) = 1/12, Cv2(U) = V(U)/E2(U) = 1/3, etc.
PROPERTY: if U is Uniform on [0,1] and M > 0, then X = -M ln(U) is Exponential with mean M.
PROOF: start from Pr(X > t) = Pr(-M ln(U) > t) and transform until you obtain exp(-t/M), proving the claim.
This means that the following macro implements an exponential r.v. having mean M > 0:
#define EXPO(M) (-M*log(UNIF()) G. Rubino
Two comments on the Exponential distribution: when “all durations” in a model (inter-arrivals in an arrival process,
service times, …) are Exponentially distributed, we have all the power of Markov theory for all sorts of analysis of the model;
in general, the Exponential assumption leads to “pessimistic” results (what is OK).
These comments are quite informal. They can be made formal in many frameworks, and they explain the frequent assumption of an Exponentially distributed duration.
G. Rubino, Oct. 2008 INTRO 2 SIMU 17
INTRO 2 SIMU 18
assume you want “to have an idea” about “the way the server system behaves”
specifically, you want to know how many requests are “typically” in the buffer; we assume that the waiting area is infinite
EXERCISE: write a C code allowing you to observe a “possible behavior” of the evolution of the number of requests in the buffer (the buffer’s backlog) with time, up to some time T chosen by the user of the program. A possible behavior is called in a trajectory (or a path) of the model.
If we plot the evolution of the backlog (the number of units in the system) with time, up to some time T, we can get something such as
T G. Rubino
INTRO 2 SIMU 19
more specifically, we want to have a command ./webserver IA S T seed!
such that, when executed, it prints a “possible trace” on the output, where IA is the mean inter-arrival time of requests, in msec, S is the mean request processing time, in msec, T is the total observation time, in msec, seed is the seed of the pseudo-random numbers used the sequence of the successive values of IA must be iid, the same with the successive values of S, and both sequences must be independent of each other.
“possible trace” means, for instance, the use of the following syntax: arrival/departure at t; then, x request(s) in buffer
G. Rubino
INTRO 2 SIMU 20
for instance, we could have something like > ./webserver 50 30 10000 314!>!> arrival at 39.9; then, 1 request(s) in buffer!> arrival at 63.6; then, 2 request(s) in buffer!> departure at 110.2; then, 1 request(s) in buffer!> …
such a trace is “equivalent” to the picture
T G. Rubino
INTRO 2 SIMU 21
t_arr = EXPO(IA); // --- first arrival time t_dep = t_arr + EXPO(S); // --- first departure time while (min(t_arr,t_dep) < T) { if (t_arr <= t_dep) { // --- arrival bcklg++; // --- one more unit in the system t_arr += EXPO(IA); // --- next arrival at time t_arr } else { // --- departure bcklg--; // --- one less unit in the system if (bcklg > 0) t_dep += EXPO(S); else t_dep = t_arr + EXPO(S); // --- next departure // at time t_dep
} } G. Rubino
INTRO 2 SIMU 22
the trace output can be too long and it’s not very illustrative it should be better to have a compact metric “capturing” the
“average” size of the backlog EXERCISE:
define an appropriate metric corresponding to the concept of “average” in this context; call it “mean backlog” (on the interval [0,T])
write a C program with the same input data as previously specified, and printing something like “In the period up to time T, the average occupation of the server was b requests”
EXERCISE: modify the previous program (simulator) to simulate a model with a
finite buffer having total capacity N requests: if a request arrives when the buffer is full, it is lost; add to the mean backlog output something like “In the period up to time T, I observed that a fraction p of the arriving requests was lost.”
G. Rubino
INTRO 2 SIMU 23
if T >> 1 then we know (see queuing part of this course) that in the finite buffer case, mean backlog on [0,T] ≈ [ ρ + NρN+2 - (N + 1)ρN+1 ]/[ (1 - ρ)(1 - ρN+1) ],
where ρ = S/IA, assumed to be ≠ 1 loss probability on the same period ≈ ρN(1 - ρ)/(1 - ρN+1) if ρ ≠ 1 if ρ = 1 then mean backlog ≈ N/2 and the loss probability ≈ 1/N
the preceding expressions are actually the exact values of the limits of the mean backlog and the mean fraction of lost requests at time t, taken when t → ∞
this means that no simulation is actually needed here; the situation is however a special one: for most models, no analytical result is available
the values of the mean backlog and the fraction of losses in the interval [0,T] are also known for this specific model (but they are very complex)
G. Rubino
INTRO 2 SIMU 24
what if the storage capacity is so large that we decide to model the buffer as unbounded (that is, N = ∞)?
EXERCISE: observe the behavior of the backlog (the occupation process) in the
unbounded model, when time increases, in the following cases: - IA = 120, S = 100 - IA = 80, S = 100 - IA = 100, S = 100
G. Rubino
INTRO 2 SIMU 25
when IA > S, the system is stable: when time increases, the mean backlog converges towards a fixed value (actually, the number ρ/(1 - ρ), where ρ = S/IA < 1)
when IA < S, the system is unstable: when time increases, the backlog increases too (probabilistically), going to ∞ with time
when when IA = S, the system is also unstable, and we observe the same behavior as when IA < S
G. Rubino
INTRO 2 SIMU 26
stability is an important issue then; it is relevant when the number of requests stored in the buffer has no a priori limit; in the finite buffer case, instability can not happen
stability issues are difficult to address through simulation; analytical techniques are the right way to study them (however, they are also (in general) difficult mathematical problems)
G. Rubino
INTRO 2 SIMU 27
suppose we want to measure the average of the response times of the first K customers
EXERCISE: modify the previous C program in order to evaluate this metric (you will need to maintain a list where each item represents a request in the buffer, storing its arrival time)
EXERCISE: prove Lindley’s relation: let Ak be the time of the kth arrival and Sk its service time let IAk = Ak +1 - Ak, k = 1, 2, …, with A0 = 0 then, we have
Rk = Sk + (Rk - 1 - IAk - 1)+ (x+ = max{x,0})
EXERCISE: write a C program evaluating the number (R1 + R2 + … + RK)/K in the infinite buffer model using Lindley’s relation
G. Rubino
INTRO 2 SIMU 28
with the previous notation, consider, in the unbounded (that is, N = ∞) and stable (IA > S) model, a, the mean arrival rate in [0,T], defined as A(T)/T where A(T) is the
number of arrivals between 0 and T d, the mean departure rate in [0,T], defined as D(T)/T where D(T) is
the number of departures between 0 and T b, the mean backlog during [0,T], explored in previous exercises r, the mean response time for the first K requests, explored in the
previous question as well u, the fraction of the interval [0,T] where the server was busy
“verify”, using the simulators, that (a ≈ 1/IA) a ≈ d (Mean Flow Conservation theorem) aS ≈ u and ar ≈ b (Little’s theorem)
G. Rubino
INTRO 2 SIMU 29
assume we want to estimate again the “long-term” behavior of the backlog
“long-term” means to evaluate the mean backlog “far from 0”, “once things have stabilized”
in order to do this, we can simulate our queue from 0 to T for a “large” T, as seen before, simulate the queue from 0 to some W without measuring anything
(W is called the “warm-up” time), then continue to simulate until T taking the measures then on the interval [W,T]
the issue of determining the right values of T in the first option, or of W and T in the second, will be addressed in the “output analysis” part of this course
EXERCISE: modify previous programs integrating this warm-up parameter
G. Rubino