your c compiler (as all compilers, software for …...intro 2 simu 1 your c compiler (as all...

G. Rubino INTRO 2 SIMU 0

INTRO 2 SIMU 1

  Your C compiler (as all compilers, software for numerical computations, etc.) allows to call a built-in function providing pseudo-random numbers behaving as the realizations of an Uniform r.v. on [0,1] (for instance, on Unix, there is the function double drand48() providing such a service).

 More specifically,   each call to such a function BEHAVES as sampling a r.v. uniformly

distributed on [0,1],   and n calls BEHAVE as sampling n r.v. Uniform on [0,1] and

INDEPENDENT OF EACH OTHER.   In other words, n calls to this function return n real numbers behaving

as the realization of a sequence of n i.i.d. r.v. Uniform on [0,1].

G. Rubino

INTRO 2 SIMU 2

  Actually, drand48() is completely deterministic.  Moreover, there is another function in the library, whose

prototype is void srand48(long), such that the sequence srand48(s); u1 = drand48(); … uN = drand48();!

always produces the same values of the variables u1, …, uN.   The parameter long s is called “the seed” of the sequence;

using the same seed we get the same sequence.   This property is extremely useful in simulation, as we will

see.  Of course, if the seed changes, the sequence changes as well

(see below).

G. Rubino

INTRO 2 SIMU 3

 Mathematically, we know today what is randomness.   This has been the work of Per Martin-Löf, Gregory Chaitin,

Andrei Kolmogorov, Ray Solomonoff, in the 60s.  One of the consequences is that, by definition, randomness is

not programmable.   The only we can do is to mimic randomness. Pseudo-random

number generators mimic randomness according to different distributions.

  That is also all we need to simulate systems.

G. Rubino

INTRO 2 SIMU 4

  Actually, the method used to implement drand48() and many other similar tools is the following:   each call to drand48() makes the computer evaluate a new term in a

sequence of the form xk+1 = a xk + b mod c

where a, b, c are integers, c >> 1; the function then returns x/c;   the seed of the sequence is the initial value x0 which is also an integer

(x0 is given to the computer by means of the srand48() function);   this explains why the sequence is deterministic and why it repeats

itself is the seed used for two different sequences of calls is the same;

  an appropriate choice of parameters a, b, c makes the sequence “very hard to distinguish” from a “really random one”.

G. Rubino

INTRO 2 SIMU 5

  Consider the macros #define UNIF() drand48()!

#define INIT_UNIF(s) srand48(s)!

#define DIE() (ceiling(6*UNIF())   The first two macros are just renaming.   Function ceiling(x) provides the smallest integer greater

than or equal to x. For instance, ceiling(3.2) = 4, ceiling(11) = 11, ceiling(-5.22) = -5.

 Observe that 6*UNIF() implements a r.v. Uniform on [0,6]. EXERCICE: show that DIE() implements an Uniform r.v. taking values on the set {1,2,3,4,5,6} (that is, a “perfect (electronic) die”). Hint: observe that Pr(“obtaining a 0”) = 0.

G. Rubino

INTRO 2 SIMU 6

  If everything is correctly done (our macros, the drand48() function, …) then if we call DIE() a large number N of times, we will observe that, say, number 5 appears approximately N/6 times (and the same for any of the possible outcomes 1, 2,…).

 Observe also that if we do the previous experiment several times (say, 4 times), using different seeds, and we observe the first, say, 10 values of each of the 4 sequences, we will probably observe very different numbers. The same will happen if we observe the 10 values after position 100000 (< N) in each of these sequences.

Regularities appear only when large number of events are considered, and they appear under specific forms (such as averages, …).

G. Rubino

INTRO 2 SIMU 7

  Consider the problem of simulating a coin tossing.  We code 1 the outcome “heads” and 0 the outcome “tails”.   Assume that the coin is generally biased, and that it has an

associated parameter p ∈ [0,1] such that heads appears with probability p.

  A simple way of implementing such a generic coin on a computer is using the macro

#define COIN(p) (UNIF() < p)!

  The coin is “fair” if p = 1/2.

G. Rubino

INTRO 2 SIMU 8

  Consider the following question: which is the probability p of observing 10 “heads” and 10 “tails” when (“perfectly”) tossing 20 times a fair coin?

  Perfectly tossing means that the outcome of each of the 20 samples can be considered independent of each other.

  We know how to solve this using elementary probability: the answer is p = 20!/( (10!)2 220 ) ≈ 0.1762.

  Assume (absurd) that you don’t know how to obtain the previous expression. You can then run the following code to get an idea about p:

nbOfSuccesses = 0; for (n = 1; n <= N; n++) { // do N times the experience! nbOfHeads = 0; ! for (m = 1; m <= 20; m++) // toss the coin 20 times ! if (COIN(0.5) == 1) nbOfHeads++;! if (nbOfHeads == 10) nbOfSuccesses++;! }! p = (double) nbOfSuccesses/N; // our estimation of p

G. Rubino

INTRO 2 SIMU 9

 We must of course use a “large” value of N.  QESTION 1: how large must be N?  QESTION 2: assume N = 1000; after running the code, can we

say anything about the “quality” of the result?  QESTION 3: can we say anything about the value of N that is

necessary to get some pre-specified accuracy in the result? and how must be formalized this idea of accuracy?

  The answers to these questions will be the object of the Output Analysis part of the course.

G. Rubino

  Recall previous example about the typical output of a simulation analysis of loss rates: interval confidence must be calculated

  In the previous probability question, for a 95%-confidence interval, we can write:

nbOfSuccesses = 0; for (n = 1; n <= N; n++) { // do N times the experience! nbOfHeads = 0; ! for (m = 1; m <= 20; m++) // toss the coin 20 times ! if (COIN(0.5) == 1) nbOfHeads++;! if (nbOfHeads == 10) nbOfSuccesses++;! }! p = (double) nbOfSuccesses/N; // our estimation of p! p1 = p – 1.96*sqrt( p*(1-p) / (N-1) );! p2 = p + 1.96*sqrt( p*(1-p) / (N-1) ); // interval is (p1, p2)

G. Rubino, Oct. 2008 INTRO 2 SIMU 10

  For a 99%-confidence interval, we can write: nbOfSuccesses = 0; for (n = 1; n <= N; n++) { // do N times the experience! nbOfHeads = 0; ! for (m = 1; m <= 20; m++) // toss the coin 20 times ! if (COIN(0.5) == 1) nbOfHeads++;! if (nbOfHeads == 10) nbOfSuccesses++;! }! p = (double) nbOfSuccesses/N; // our estimation of p! p1 = p – 2.58*sqrt( p*(1-p) / (N-1) );! p2 = p + 2.58*sqrt( p*(1-p) / (N-1) ); // interval is (p1, p2)

  For a 99.9%-confidence interval, … p1 = p – 3.29*sqrt( p*(1-p) / (N-1) );! p2 = p + 3.29*sqrt( p*(1-p) / (N-1) ); // interval is (p1, p2)


INTRO 2 SIMU 12

 Of course, simulation can help to evaluate values that are difficult to obtain.

  For instance, let X be the minimal number N of times we must toss a fair coin such that the square of the difference between the # of heads and the # of tails is > N. Assume we want to know p = Pr(X > 10). This looks difficult to evaluate analytically.

  EXERCISE: design a C code able to evaluate p.   EXERCISE: write a C program in order to evaluate the

probability that if we throw 10 dice, the square of the sum of the obtained numbers minus the sum of their squares is less than 200.

G. Rubino

  N = # of tosses  H = # of heads, T = # of tails, H + T = N   Y = (H – T)2

  N = 1 0 -> H=0, T=1, Y=1; Y > N? No; X = ∞ 1 -> H=1, T=0, Y=1; Y > N? No; X = ∞ Pr(X=1) = 0

  N = 2 00 -> H=0, T=2, Y=4; Y > N? Yes 01 -> H=1, T=1, Y=0; Y > N? No 10 -> H=1, T=1, Y=0; Y > N? No 11 -> H=2, T=0, Y=4; Y > N? Yes Pr(X=2) = 2*(1/4) = 1/2


  N = 3 000 -> Y=9; Y > N? Yes 001 -> Y=1; Y > N? No 010 -> Y=1; Y > N? No 011 -> Y=1; Y > N? No 100 -> Y=1; Y > N? No 101 -> Y=1; Y > N? No 110 -> Y=1; Y > N? No 111 -> Y=9; Y > N? Yes Pr(X=3) = 2*(1/8) = ¼

  Etc.


INTRO 2 SIMU 15

  “the random variable (r.v.) X has the Exponential distribution (or is Exponentially distributed) with mean M” means that   M > 0, X ≥ 0   Pr(X > t) = exp(-t/M)   density of X: exp(-t/M)/M   E(X) = M   Var(X) = M2; StdDev(X) = (Var(X))1/2 = M   Cv(X) = StdDev(X)/E(X) = 1

  the number 1/M is the parameter of the distribution of X; if we denote 1/M = α, then   Pr(X > t) = exp(-αt), density = α exp(-αt), E(X) = 1/α, etc.

G. Rubino

INTRO 2 SIMU 16

  Recall that “U is an Uniform r.v. on [0,1]” means that   0 ≤ U ≤ 1   the density of U is the function 1([0,1])   Pr(U ≤ u) = u, for any u in [0,1]   We then have E(U) = 1/2, V(U) = 1/12, Cv2(U) = V(U)/E2(U) = 1/3, etc.

  PROPERTY: if U is Uniform on [0,1] and M > 0, then X = -M ln(U) is Exponential with mean M.

PROOF: start from Pr(X > t) = Pr(-M ln(U) > t) and transform until you obtain exp(-t/M), proving the claim.

  This means that the following macro implements an exponential r.v. having mean M > 0:

#define EXPO(M) (-M*log(UNIF()) G. Rubino

  Two comments on the Exponential distribution:   when “all durations” in a model (inter-arrivals in an arrival process,

service times, …) are Exponentially distributed, we have all the power of Markov theory for all sorts of analysis of the model;

  in general, the Exponential assumption leads to “pessimistic” results (what is OK).

  These comments are quite informal. They can be made formal in many frameworks, and they explain the frequent assumption of an Exponentially distributed duration.


INTRO 2 SIMU 18

  assume you want “to have an idea” about “the way the server system behaves”

  specifically, you want to know how many requests are “typically” in the buffer; we assume that the waiting area is infinite

  EXERCISE: write a C code allowing you to observe a “possible behavior” of the evolution of the number of requests in the buffer (the buffer’s backlog) with time, up to some time T chosen by the user of the program. A possible behavior is called in a trajectory (or a path) of the model.

  If we plot the evolution of the backlog (the number of units in the system) with time, up to some time T, we can get something such as

T G. Rubino

INTRO 2 SIMU 19

 more specifically, we want to have a command ./webserver IA S T seed!

such that, when executed, it prints a “possible trace” on the output, where   IA is the mean inter-arrival time of requests, in msec,   S is the mean request processing time, in msec,   T is the total observation time, in msec,   seed is the seed of the pseudo-random numbers used   the sequence of the successive values of IA must be iid, the same with the successive values of S, and both sequences must be independent of each other.

  “possible trace” means, for instance, the use of the following syntax: arrival/departure at t; then, x request(s) in buffer

G. Rubino

INTRO 2 SIMU 20

  for instance, we could have something like > ./webserver 50 30 10000 314!>!> arrival at 39.9; then, 1 request(s) in buffer!> arrival at 63.6; then, 2 request(s) in buffer!> departure at 110.2; then, 1 request(s) in buffer!> …

  such a trace is “equivalent” to the picture

T G. Rubino

INTRO 2 SIMU 21

t_arr = EXPO(IA); // --- first arrival time t_dep = t_arr + EXPO(S); // --- first departure time while (min(t_arr,t_dep) < T) { if (t_arr <= t_dep) { // --- arrival bcklg++; // --- one more unit in the system t_arr += EXPO(IA); // --- next arrival at time t_arr } else { // --- departure bcklg--; // --- one less unit in the system if (bcklg > 0) t_dep += EXPO(S); else t_dep = t_arr + EXPO(S); // --- next departure // at time t_dep

} } G. Rubino

INTRO 2 SIMU 22

  the trace output can be too long and it’s not very illustrative   it should be better to have a compact metric “capturing” the

“average” size of the backlog   EXERCISE:

  define an appropriate metric corresponding to the concept of “average” in this context; call it “mean backlog” (on the interval [0,T])

  write a C program with the same input data as previously specified, and printing something like “In the period up to time T, the average occupation of the server was b requests”

  EXERCISE:   modify the previous program (simulator) to simulate a model with a

finite buffer having total capacity N requests: if a request arrives when the buffer is full, it is lost; add to the mean backlog output something like “In the period up to time T, I observed that a fraction p of the arriving requests was lost.”

G. Rubino

INTRO 2 SIMU 23

  if T >> 1 then we know (see queuing part of this course) that in the finite buffer case,   mean backlog on [0,T] ≈ [ ρ + NρN+2 - (N + 1)ρN+1 ]/[ (1 - ρ)(1 - ρN+1) ],

where ρ = S/IA, assumed to be ≠ 1   loss probability on the same period ≈ ρN(1 - ρ)/(1 - ρN+1) if ρ ≠ 1   if ρ = 1 then mean backlog ≈ N/2 and the loss probability ≈ 1/N

  the preceding expressions are actually the exact values of the limits of the mean backlog and the mean fraction of lost requests at time t, taken when t → ∞

  this means that no simulation is actually needed here; the situation is however a special one: for most models, no analytical result is available

  the values of the mean backlog and the fraction of losses in the interval [0,T] are also known for this specific model (but they are very complex)

G. Rubino

INTRO 2 SIMU 24

 what if the storage capacity is so large that we decide to model the buffer as unbounded (that is, N = ∞)?

  EXERCISE:   observe the behavior of the backlog (the occupation process) in the

unbounded model, when time increases, in the following cases: -  IA = 120, S = 100 -  IA = 80, S = 100 -  IA = 100, S = 100

G. Rubino

INTRO 2 SIMU 25

 when IA > S, the system is stable: when time increases, the mean backlog converges towards a fixed value (actually, the number ρ/(1 - ρ), where ρ = S/IA < 1)

 when IA < S, the system is unstable: when time increases, the backlog increases too (probabilistically), going to ∞ with time

 when when IA = S, the system is also unstable, and we observe the same behavior as when IA < S

G. Rubino

INTRO 2 SIMU 26

  stability is an important issue then; it is relevant when the number of requests stored in the buffer has no a priori limit; in the finite buffer case, instability can not happen

  stability issues are difficult to address through simulation; analytical techniques are the right way to study them (however, they are also (in general) difficult mathematical problems)

G. Rubino

INTRO 2 SIMU 27

  suppose we want to measure the average of the response times of the first K customers

  EXERCISE: modify the previous C program in order to evaluate this metric (you will need to maintain a list where each item represents a request in the buffer, storing its arrival time)

  EXERCISE: prove Lindley’s relation:   let Ak be the time of the kth arrival and Sk its service time   let IAk = Ak +1 - Ak, k = 1, 2, …, with A0 = 0   then, we have

Rk = Sk + (Rk - 1 - IAk - 1)+ (x+ = max{x,0})

  EXERCISE: write a C program evaluating the number (R1 + R2 + … + RK)/K in the infinite buffer model using Lindley’s relation

G. Rubino

INTRO 2 SIMU 28

 with the previous notation, consider, in the unbounded (that is, N = ∞) and stable (IA > S) model,   a, the mean arrival rate in [0,T], defined as A(T)/T where A(T) is the

number of arrivals between 0 and T   d, the mean departure rate in [0,T], defined as D(T)/T where D(T) is

the number of departures between 0 and T   b, the mean backlog during [0,T], explored in previous exercises   r, the mean response time for the first K requests, explored in the

previous question as well   u, the fraction of the interval [0,T] where the server was busy

  “verify”, using the simulators, that   (a ≈ 1/IA)   a ≈ d (Mean Flow Conservation theorem)   aS ≈ u and ar ≈ b (Little’s theorem)

G. Rubino

INTRO 2 SIMU 29

  assume we want to estimate again the “long-term” behavior of the backlog

  “long-term” means to evaluate the mean backlog “far from 0”, “once things have stabilized”

  in order to do this, we can   simulate our queue from 0 to T for a “large” T, as seen before,   simulate the queue from 0 to some W without measuring anything

(W is called the “warm-up” time), then continue to simulate until T taking the measures then on the interval [W,T]

  the issue of determining the right values of T in the first option, or of W and T in the second, will be addressed in the “output analysis” part of this course

  EXERCISE: modify previous programs integrating this warm-up parameter

G. Rubino

your c compiler (as all compilers, software for …...intro 2 simu 1 your c compiler (as all...

Documents