09 input modeling - rwth aachen university · 2007. 6. 26. · one of the bigggg g pest tasks in...

56
Computer Science, Informatik 4 Communication and Distributed Systems Simulation Simulation Modeling and Performance Anal ysis with Discrete-Event Simulation Dr. Mesut Güneş

Upload: others

Post on 15-Mar-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

SimulationSimulation

Modeling and Performance Analysis with Discrete-Event Simulationg y

Dr. Mesut Güneş

Page 2: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Chapter 9

Input Modeling

Page 3: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

ContentsContents

Data CollectionData CollectionIdentifying the Distribution with DataParameter EstimationGoodness-of-Fit TestsFitting a Nonstationary Poisson ProcessSelecting Input Models without DataMultivariate and Time-Series Input Data

Dr. Mesut GüneşChapter 9. Input Modeling 3

Page 4: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Purpose & OverviewPurpose & Overview

Input models provide the driving force for a simulation model.p p gThe quality of the output is no better than the quality of inputs.In this chapter, we will discuss the 4 steps of input model d l tdevelopment:1) Collect data from the real system2) Identify a probability distribution to represent the input process2) Identify a probability distribution to represent the input process3) Choose parameters for the distribution4) Evaluate the chosen distribution and parameters for goodness

of fit.

Dr. Mesut GüneşChapter 9. Input Modeling 4

Page 5: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Data CollectionData Collection

Dr. Mesut GüneşChapter 9. Input Modeling 5

Page 6: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Data CollectionData Collection

One of the biggest tasks in solving a real problemgg g p• GIGO – Garbage-In-Garbage-Out

Raw Data InputData Output

SystemPerformanceSimulation

Even when model structure is valid simulation results can beEven when model structure is valid simulation results can be misleading, if the input data is• inaccurately collected• inappropriately analyzed• inappropriately analyzed• not representative of the environment

Dr. Mesut GüneşChapter 9. Input Modeling 6

Page 7: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Data CollectionData Collection

Suggestions that may enhance and facilitate dataSuggestions that may enhance and facilitate data collection:• Plan ahead: begin by a practice or pre-observing session, watch

for unusual circumstances• Analyze the data as it is being collected: check adequacy• Combine homogeneous data sets: successive time• Combine homogeneous data sets: successive time

periods, during the same time period on successive days• Be aware of data censoring: the quantity is not observed in its

entirety, danger of leaving out long process times• Check for relationship between variables (scatter diagram)• Check for autocorrelation• Check for autocorrelation• Collect input data, not performance data

Dr. Mesut GüneşChapter 9. Input Modeling 7

Page 8: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Identifying the DistributionIdentifying the Distribution

Dr. Mesut GüneşChapter 9. Input Modeling 8

Page 9: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Identifying the DistributionIdentifying the Distribution

HistogramsgScatter DiagramsSelecting families of distributionsParameter estimationGoodness-of-fit testsFitting a non stationary processFitting a non-stationary process

Dr. Mesut GüneşChapter 9. Input Modeling 9

Page 10: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

HistogramsHistograms

A frequency distribution or histogram is useful in determining q y g gthe shape of a distributionThe number of class intervals depends on:• The number of observations• The dispersion of the data• Suggested number of intervals: the square root of the sample size

For continuous data: • Corresponds to the probability density function of a theoretical

distributionFor discrete data: • Corresponds to the probability mass function

If few data points are available• combine adjacent cells to eliminate the ragged appearance of the

histogram

Dr. Mesut GüneşChapter 9. Input Modeling 10

g

Page 11: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

HistogramsHistograms

Same data with different 10

15

Same data with different interval sizes

0

5

10

0 2 4 6 8 10 12 14 16 18 200 2 4 6 8 10 12 14 16 18 20

30

0

10

20

4 8 12 16 20

40

30

35

40

Dr. Mesut GüneşChapter 9. Input Modeling 11

25

7 14 20

Page 12: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Histograms – ExampleHistograms – Example

Vehicle Arrival Example: Arrivals

per Period Frequency0 12

pNumber of vehicles arriving at an intersection between 7 am and 7:05 am was monitored for

0 121 102 193 174 10and 7:05 am was monitored for

100 random workdays.There are ample data, so the hi t h ll f

4 105 86 77 58 5

histogram may have a cell for each possible value in the data range

9 310 311 1

20

10

15

20

0

5

10

Dr. Mesut GüneşChapter 9. Input Modeling 12

0

0 1 2 3 4 5 6 7 8 9 10 11

Page 13: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Histograms – ExampleHistograms – Example

Life tests were performed on electronic components at 1.5Life tests were performed on electronic components at 1.5 times the nominal voltage, and their lifetime was recorded

Component Life Frequency0 3 230 ≤ x < 3 23

3 ≤ x < 6 10

6 ≤ x < 9 5

9 ≤ x < 12 1

12 ≤ x < 15 1

42 ≤ x < 45 1

Dr. Mesut GüneşChapter 9. Input Modeling 13

144 ≤ x < 147 1

Page 14: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Histograms – ExampleHistograms – Example

Stanford University Mobile Activity Traces (SUMATRA)• Target community: cellular

network research community• Traces contain mobility as well as

connection informationconnection informationAvailable traces

• SULAWESI (S.U. Local Area Wireless Environment Signaling Information)

• BALI (Bay Area Location Information)• BALI (Bay Area Location Information)

BALI Characteristics• San Francisco Bay Areay• Trace length: 24 hour• Number of cells: 90• Persons per cell: 1100 • Persons at all: 99 000• Persons at all: 99.000• Active persons: 66.550• Move events: 243.951• Call events: 1.570.807

Question: How to transform the BALI information so that it is usable with a network simulator, e.g., ns-2?

N d b ll ti

Dr. Mesut GüneşChapter 9. Input Modeling 14

• Node number as well as connection number is too high for ns-2

Page 15: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Histograms – ExampleHistograms – Example

Analysis of the BALI Trace 1600

1800

y• Goal: Reduce the amount of

data by identifying user groupsUser group

600

800

1000

1200

1400

Peo

ple

g p• Between 2 local minima• Communication characteristic

is kept in the group 05

3040

500

200

400

600

Cp g p• A user represents a group

Groups with different mobility characteristics

510

1520

0

10

20 Calls

Movements

characteristics• Intra- and inter group

communicationInteresting characteristic

15000

20000

25000

f Peo

ple

Interesting characteristic• Number of people with odd

number movements is negligible! 0

5000

10000

Num

ber o

f

Dr. Mesut GüneşChapter 9. Input Modeling 15

negligible!-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19

Number of Movements

Page 16: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Scatter DiagramsScatter Diagrams

A scatter diagram is a quality tool that can show theA scatter diagram is a quality tool that can show the relationship between paired data• Random Variable X = Data 1 • Random Variable Y = Data 2 • Draw random variable X on the x-axis and Y on the y-axis

40

20

30

40

40

60

20

30

40

0

10

0

20

0

10

20

Dr. Mesut GüneşChapter 9. Input Modeling 16

Strong Correlation Moderate Correlation No Correlation

0 10 20 30 40 0 10 20 30 40 0 10 20 30 40

Page 17: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Scatter DiagramsScatter Diagrams

Linear relationshipLinear relationship• Correlation: Measures how well data line up• Slope: Measures the steepness of the data• Direction• Y Intercept

30

35

35

40Positive Correlation Negative Correlation

20

25

30

20

25

30

35

5

10

15

5

10

15

Dr. Mesut GüneşChapter 9. Input Modeling 17

0

0 5 10 15 20 25 30 35

0

0 5 10 15 20 25 30 35

Page 18: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Selecting the Family of DistributionsSelecting the Family of Distributions

A family of distributions is selected based on:A family of distributions is selected based on:• The context of the input variable• Shape of the histogram

Frequently encountered distributions:• Easier to analyze: Exponential, Normal and Poisson

• Harder to analyze: Beta, Gamma and Weibull

Dr. Mesut GüneşChapter 9. Input Modeling 18

Page 19: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Selecting the Family of DistributionsSelecting the Family of Distributions

Use the physical basis of the distribution as a guide, for example:Use the physical basis of the distribution as a guide, for example:• Binomial: Number of successes in n trials• Poisson: Number of independent events that occur in a fixed amount of

time or spacetime or space• Normal: Distribution of a process that is the sum of a number of

component processes• Exponential: time between independent events, or a process time that is

memoryless• Weibull: time to failure for components• Discrete or continuous uniform: models complete uncertainty• Triangular: a process for which only the minimum, most likely, and

maximum values are knownmaximum values are known• Empirical: resamples from the actual data collected

Dr. Mesut GüneşChapter 9. Input Modeling 19

Page 20: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Selecting the Family of DistributionsSelecting the Family of Distributions

Remember the physical characteristics of the processRemember the physical characteristics of the process• Is the process naturally discrete or continuous valued?• Is it bounded?

No “true” distribution for any stochastic input processGoal: obtain a good approximation

Dr. Mesut GüneşChapter 9. Input Modeling 20

Page 21: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Quantile-Quantile PlotsQuantile-Quantile Plots

Q-Q plot is a useful tool for evaluating distribution fitQ Q p gIf X is a random variable with CDF F, then the q-quantile of X is the γsuch that

10f)()( ≤XPF• When F has an inverse, γ = F-1(q)

10for )()( <<=≤= qqXPF γγ

Let {xi, i = 1,2, …., n} be a sample of data from Xand {yj, j = 1,2, …, n} be this sample in ascending order:

⎟⎠⎞

⎜⎝⎛ −−

njFy j

5.0ely approximat is 1

• where j is the ranking or order number

⎠⎝ n

Dr. Mesut GüneşChapter 9. Input Modeling 21

where j is the ranking or order number

Page 22: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Quantile-Quantile PlotsQuantile-Quantile Plots

The plot of yj versus F-1( ( j - 0 5 ) / n) isThe plot of yj versus F ( ( j - 0.5 ) / n) is • Approximately a straight line if F is a member of an appropriate family of

distributionsTh li h l 1 if F i b f i t f il f• The line has slope 1 if F is a member of an appropriate family of distributions with appropriate parameter values

F-1()

Dr. Mesut GüneşChapter 9. Input Modeling 22

yj

Page 23: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Quantile-Quantile Plots – ExampleQuantile-Quantile Plots – Example

Example: Door installation times of a robot follows j ValueExample: Door installation times of a robot follows a normal distribution.• The observations are ordered from the smallest to

the largest:

j Value1 99,552 99,563 99 62the largest:

• yj are plotted versus F-1((j - 0.5)/n) where F has a normal distribution with the sample mean (99.99 sec) and sample variance (0 28322 sec2)

3 99,624 99,655 99,796 99 98and sample variance (0.28322 sec2) 6 99,987 100,028 100,069 100 179 100,17

10 100,2311 100,2612 100,2712 100,2713 100,3314 100,4115 100,47

Dr. Mesut GüneşChapter 9. Input Modeling 23

,

Page 24: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Quantile-Quantile Plots – ExampleQuantile-Quantile Plots – Example

Example (continued): Check whether the door a p e (co ued) C ec e e e dooinstallation times follow a normal distribution.

100,4

100,6

100,8

99,6

99,8

100

100,2Straight

line, supporting the hypothesis of a

normal distribution99,2

99,4

99,2 99,4 99,6 99,8 100 100,2 100,4 100,6 100,8

normal distribution

0,2

0,25

0,3

0,35

Superimposed

0,05

0,1

0,15Superimposed

density function of the normal distribution

Dr. Mesut GüneşChapter 9. Input Modeling 24

099,4 99,6 99,8 100 100,2 100,4 100,6

Page 25: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Quantile-Quantile PlotsQuantile-Quantile Plots

Consider the following while evaluating the linearity of a Q-Q plot:g g y Q Q p• The observed values never fall exactly on a straight line• The ordered values are ranked and hence not independent, unlikely for

the points to be scattered about the linep• Variance of the extremes is higher than the middle. Linearity of the

points in the middle of the plot is more important.

Q-Q plot can also be used to check homogeneity • It can be used to check whether a single distribution can represent two

sample setssample sets• Given two random variables

- X and x1, x2, …, xn- Z and z1 z2 zZ and z1, z2, …, zn

• Plotting the ordered values of X and Z against each other reveals approximately a straight line if X and Z are well represented by the same distribution

Dr. Mesut GüneşChapter 9. Input Modeling 25

Page 26: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Parameter EstimationParameter Estimation

Dr. Mesut GüneşChapter 9. Input Modeling 26

Page 27: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Parameter EstimationParameter Estimation

Parameter Estimation: Next step after selecting a family of p g ydistributionsIf observations in a sample of size n are X1, X2, …, Xn (discrete or continuous), the sample mean and sample variance are:), p p

122

21−

== ∑∑ ==XnX

SX

Xn

i in

i i

If the data are discrete and have been grouped in a frequency

1

−nS

nX

distribution:

122

21−∑∑ ==

XnXfXf n

j jjn

j jj

• where fj is the observed frequency of value Xj

1 121

−==

∑∑ ==

n

fS

n

fX j jjj jj

Dr. Mesut GüneşChapter 9. Input Modeling 27

where fj is the observed frequency of value Xj

Page 28: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Parameter EstimationParameter Estimation

When raw data are unavailable (data are grouped into classWhen raw data are unavailable (data are grouped into class intervals), the approximate sample mean and variance are:

1 1

2221

−==

∑∑ ==

n

XnmfS

n

mfX

n

j jjc

j jj

• fj is the observed frequency in the j-th class interval• mj is the midpoint of the j-th intervalmj is the midpoint of the j th interval• c is the number of class intervals

A parameter is an unknown constant, but an estimator is a statistic.

Dr. Mesut GüneşChapter 9. Input Modeling 28

Page 29: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Parameter Estimation – Example

Vehicle Arrival Example (continued): Table in the histogram of the example

Parameter Estimation – Example

Vehicle Arrival Example (continued): Table in the histogram of the example on Slide 12 can be analyzed to obtain:

∑∑ =========

k

j jjk

j jj XfXfXfXfn1

212211 2080 and ,364 and ,...1,10,0,12,100

• The sample mean and variance are

jj

3 64364X20

25

99)64.3(1002080

3.6410036

22 ⋅−

=

==

S

X

10

15

Freq

uenc

y

63.799

=0

5

0 1 2 3 4 5 6 7 8 9 10 11Number of Arrivals per Period

• The histogram suggests X to have a Poisson distribution- However, note that sample mean is not equal to sample variance.

Number of Arrivals per Period

Dr. Mesut GüneşChapter 9. Input Modeling 29

– Theoretically: Poisson with parameter λ μ = σ2 = λ- Reason: each estimator is a random variable, it is not perfect.

Page 30: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Parameter EstimationParameter Estimation

Suggested Estimators for Distributions often used in SimulationSuggested Estimators for Distributions often used in Simulation• Maximum-Likelihood Estimators

Distribution Parameter EstimatorPoisson α X=α

Exponential λ

G β θX1ˆ =λ

1Gamma β, θ

Normal μ, σ2

X1ˆ =θ

22ˆˆ SX == σμ

Lognormal μ, σ2

, SX == σμ22ˆ,ˆ SX == σμ

Dr. Mesut GüneşChapter 9. Input Modeling 30

After taking lnof data.

Page 31: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Goodness-of-Fit TestsGoodness of Fit Tests

Dr. Mesut GüneşChapter 9. Input Modeling 31

Page 32: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Goodness-of-Fit TestsGoodness-of-Fit Tests

Conduct hypothesis testing on input data distribution usingyp g p g• Kolmogorov-Smirnov test • Chi-square test

No single correct distribution in a real application existsNo single correct distribution in a real application exists • If very little data are available, it is unlikely to reject any candidate

distributions• If a lot of data are available it is likely to reject all candidate distributions• If a lot of data are available, it is likely to reject all candidate distributions

Be aware of mistakes in decision finding• Type I Error: α• Type II Error: β

Statistical Decision

State of the null hypothesisH0 True H0 False

Reject H Type I Error Correct

Dr. Mesut GüneşChapter 9. Input Modeling 32

Reject H0 Type I Error CorrectAccept H0 Correct Type II Error

Page 33: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Chi-Square TestChi-Square Test

Intuition: comparing the histogram of the data to the shape of p g g pthe candidate density or mass functionValid for large sample sizes when parameters are estimated by maximum likelihoodby maximum-likelihoodArrange the n observations into a set of k class intervalsThe test statistic is:

∑ −=

kii

EEO 2

20

)(χ

Expected FrequencyEi = n*pi

=i iE1Observed Frequency in

the i-th class

where pi is the theoretical prob. of the i-th interval.Suggested Minimum = 5

• approximately follows the chi-square distribution with k-s-1degrees of freedom

• s = number of parameters of the hypothesized distribution

20χ

Dr. Mesut GüneşChapter 9. Input Modeling 33

s number of parameters of the hypothesized distribution estimated by the sample statistics.

Page 34: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Chi-Square TestChi-Square Test

The hypothesis of a chi-square test isThe hypothesis of a chi square test isH0: The random variable, X, conforms to the distributional assumption with the parameter(s) given by the estimate(s).H Th d i bl X d t fH1: The random variable X does not conform.

H0 is rejected if 21,

20 −−> skαχχ0

If the distribution tested is discrete and combining adjacent cell is notIf the distribution tested is discrete and combining adjacent cell is not required (so that Ei > minimum requirement):• Each value of the random variable should be a class interval, unless

bi i i dcombining is necessary, and

)xP(X)p(xp ===

Dr. Mesut GüneşChapter 9. Input Modeling 34

)xP(X ) p(xp iii

Page 35: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Chi-Square TestChi-Square Test

If the distribution tested is continuous:

)()( )( 11

−−== ∫−

ii

a

ai aFaFdxxf p i

i

• where ai-1 and ai are the endpoints of the i-th class interval• f(x) is the assumed pdf, F(x) is the assumed cdf• Recommended number of class intervals (k):• Recommended number of class intervals (k):

Sample Size, n Number of Class Intervals, k

20 Do not use the chi-square test

50 5 to 10

100 10 to 20

• Caution: Different grouping of data (i.e., k) can affect the hypothesis t ti lt

> 100 n 1/2 to n /5

Dr. Mesut GüneşChapter 9. Input Modeling 35

testing result.

Page 36: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Chi-Square Test – ExampleChi-Square Test – Example

Vehicle Arrival Example (continued):Vehicle Arrival Example (continued): H0: the random variable is Poisson distributed.H1: the random variable is not Poisson distributed.

!

)(

xen

xnpEx

i

αα−

=

=xi Observed Frequency, Oi Expected Frequency, Ei (Oi - Ei)2/Ei

0 12 2.61 10 9.62 19 17.4 0.15

7.8722 12.2

!x3 17 21.1 0.84 19 19.2 4.415 6 14.0 2.576 7 8.5 0.267 5 4.4

Combined because of the assumption of

8 5 2.09 3 0.8

10 3 0.3> 11 1 0.1

100 100.0 27.68

11.62min Ei = 5, e.g.,

E1 = 2.6 < 5, hence combine with E2

17 7.6

• Degree of freedom is k-s-1 = 7-1-1 = 5, hence, the hypothesis is rejected at the 0.05 level of significance.

1116827 22 >

Dr. Mesut GüneşChapter 9. Input Modeling 36

1.1168.27 25,05.0

20 =>= χχ

Page 37: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Kolmogorov-Smirnov TestKolmogorov-Smirnov Test

Intuition: formalize the idea behind examining a Q-Q plotg pRecall • The test compares the continuous cdf, F(x), of the hypothesized

distribution with the empirical cdf, SN(x), of the N sample observations. p , ( ), p• Based on the maximum difference statistic:

D = max| F(x) - SN(x) |D max| F(x) SN(x) |

A more powerful test, particularly useful when:S l i ll• Sample sizes are small

• No parameters have been estimated from the dataWhen parameter estimates have been made:• Critical values are biased, too large.• More conservative, i.e., smaller Type I error than specified.

Dr. Mesut GüneşChapter 9. Input Modeling 37

Page 38: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

p-Values and “Best Fits”p-Values and Best Fits

p-value for the test statisticsp value for the test statistics• The significance level at which one would just reject H0 for the given test

statistic value.• A measure of fit the larger the better• A measure of fit, the larger the better• Large p-value: good fit• Small p-value: poor fit

Vehicle Arrival Example (cont.): • H : data is Poisson• H0: data is Poisson• Test statistics: , with 5 degrees of freedom• The p-value F(5, 27.68) = 0.00004, meaning we would reject H0 with 0.00004

i ifi l l h P i i fit

68.2720 =χ

significance level, hence Poisson is a poor fit.

Dr. Mesut GüneşChapter 9. Input Modeling 38

Page 39: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

p-Values and “Best Fits”p-Values and Best Fits

Many software use p-value as the ranking measure to automaticallyMany software use p value as the ranking measure to automatically determine the “best fit”. Things to be cautious about:• Software may not know about the physical basis of the data, distribution

families it suggests may be inappropriate.• Close conformance to the data does not always lead to the most

appropriate input model.• p-value does not say much about where the lack of fit occurs

Recommended: always inspect the automatic selection usingRecommended: always inspect the automatic selection using graphical methods.

Dr. Mesut GüneşChapter 9. Input Modeling 39

Page 40: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Fitting a Non-stationary Poisson ProcessFitting a Non stationary Poisson Process

Dr. Mesut GüneşChapter 9. Input Modeling 40

Page 41: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Fitting a Non-stationary Poisson ProcessFitting a Non-stationary Poisson Process

Fitting a NSPP to arrival data is difficult, possible approaches:Fitting a NSPP to arrival data is difficult, possible approaches:• Fit a very flexible model with lots of parameters• Approximate constant arrival rate over some basic interval of

time, but vary it from time interval to time interval.Suppose we need to model arrivals over time [0, T], our approach is the most appropriate when we can:approach is the most appropriate when we can:• Observe the time period repeatedly• Count arrivals / record arrival times• Divide the time period into k equal intervals of length Δt =T/k• Over n periods of observation let Cij be the number of arrivals

during the i th interval on the j th periodduring the i-th interval on the j-th period

Dr. Mesut GüneşChapter 9. Input Modeling 41

Page 42: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Fitting a Non-stationary Poisson ProcessFitting a Non-stationary Poisson Process

The estimated arrival rate during the i-th time period g p(i-1) Δt < t ≤ i Δt is:

∑Δ=

n

ijCtn

t1

1)(λ

• n = Number of observation periods• Δt = Time interval length

=Δ jtn 1

Δt Time interval length• Cij = Number of arrivals during the i-th time interval on the j-th

observation periodExample: Divide a 10-hour business day [8am 6pm] into equalExample: Divide a 10 hour business day [8am,6pm] into equal intervals k = 20 whose length Δt = ½, and observe over n=3 days

Day 1 Day 2 Day 3Number of Arrivals

Time PeriodEstimated Arrival Rate (arrivals/hr) For instanceDay 1 Day 2 Day 3

8:00 - 8:30 12 14 10 24

8:30 - 9:00 23 26 32 54

Time Period Rate (arrivals/hr) For instance, 1/3(0.5)*(23+26+32)= 54 arrivals/hour

Dr. Mesut GüneşChapter 9. Input Modeling 42

9:00 - 9:30 27 18 32 52

9:30 - 10:00 20 13 12 30

Page 43: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Selecting Model without DataSelecting Model without Data

Dr. Mesut GüneşChapter 9. Input Modeling 43

Page 44: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Selecting Model without DataSelecting Model without Data

If data is not available, some possible sources to obtain , pinformation about the process are:• Engineering data: often product or process has performance

ratings provided by the manufacturer or company rules specifyratings provided by the manufacturer or company rules specify time or production standards.

• Expert option: people who are experienced with the process or similar processes often they can provide optimistic pessimisticsimilar processes, often, they can provide optimistic, pessimistic and most-likely times, and they may know the variability as well.

• Physical or conventional limitations: physical limits on performance limits or bounds that narrow the range of the inputperformance, limits or bounds that narrow the range of the input process.

• The nature of the process.The uniform, triangular, and beta distributions are often used as input models.• Speed of a vehicle?

Dr. Mesut GüneşChapter 9. Input Modeling 44

p

Page 45: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Selecting Model without DataSelecting Model without Data

Example: Production planning p p gsimulation.

• Input of sales volume of various products is required, salesperson

i Interval (Sales) PDFCumulative

Frequency, ci

1 1000 <= X <= 2000 0.1 0.10

2 2000 < X <=2500 0.65 0.75of product XYZ says that:

- No fewer than 1000 units and no more than 5000 units will be sold

3 2500 < X <= 4500 0.24 0.99

4 4500 < X <= 5000 0.01 1.00

more than 5000 units will be sold.

- Given her experience, she believes there is a 90% chance of selling

1,20

more than 2000 units, a 25% chance of selling more than 2500 units, and only a 1% chance of selling more than 4500 units.

0 60

0,80

1,00

• Translating these information into a cumulative probability of being less 0,20

0,40

0,60

Dr. Mesut GüneşChapter 9. Input Modeling 45

than or equal to those goals for simulation input: 0,00

1000 <= X <= 2000 2000 < X <=2500 2500 < X <= 4500 4500 < X <= 5000

Page 46: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Multivariate and Time-Series Input ModelsMultivariate and Time Series Input Models

Dr. Mesut GüneşChapter 9. Input Modeling 46

Page 47: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Multivariate and Time-Series Input ModelsMultivariate and Time-Series Input Models

The random variable discussed until now were considered to beThe random variable discussed until now were considered to be independent of any other variables within the context of the problem• However, variables may be related

If th i t th l ti hi h ld b i ti t d d t k• If they appear as input, the relationship should be investigated and taken into consideration

Multivariate input models• Fixed, finite number of random variables X1, X2, …, Xk

• For example, lead time and annual demand for an inventory model• An increase in demand results in lead time increase hence variables areAn increase in demand results in lead time increase, hence variables are

dependent.Time-series input models

I fi it f d i bl• Infinite sequence of random variables, e.g., X1, X2, X3, …• For example, time between arrivals of orders to buy and sell stocks• Buy and sell orders tend to arrive in bursts, hence, times between arrivals

Dr. Mesut GüneşChapter 9. Input Modeling 47

yare dependent.

Page 48: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Covariance and CorrelationCovariance and Correlation

Consider the model that describes relationship between X1 and X2:p 1 2

• β = 0 X and X are statistically independent

εμβμ +−=− )()( 2211 XX ε is a random variable with mean 0 and is independent of X2β 0, X1 and X2 are statistically independent

• β > 0, X1 and X2 tend to be above or below their means together• β < 0, X1 and X2 tend to be on opposite sides of their means

Covariance between X and XCovariance between X1 and X2:

2121221121 )()])([(),cov( μμμμ −=−−= XXEXXEXX

Covariance between X1 and X2:

⎪⎧=

⎪⎧= 00

• where⎪⎩

⎪⎨><

⎪⎩

⎪⎨ ⇒><

00

00),cov( 21 βXX ∞<<∞− ),cov( 21 XX

Dr. Mesut GüneşChapter 9. Input Modeling 48

Page 49: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Covariance and CorrelationCovariance and Correlation

Correlation between X1 and X2 (values between -1 and 1):Correlation between X1 and X2 (values between 1 and 1):

21

2121

),cov(),(corrσσ

ρ XXXX ==

• where

21

⎪⎨

⎧<=

⇒⎪⎨

⎧<=

00

00

)( βXXcorrwhere⎪⎩

⎨><⇒

⎪⎩

⎨><

00

00),( 21 βXXcorr

• The closer ρ is to -1 or 1, the stronger the linear relationship is betweenX1 and X2.

Dr. Mesut GüneşChapter 9. Input Modeling 49

Page 50: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Covariance and CorrelationCovariance and Correlation

A time series is a sequence of random variables X1, X2, X3,… q 1, 2, 3,which are identically distributed (same mean and variance) but dependent.• cov(X X ) is the lag-h autocovariance• cov(Xt, Xt+h) is the lag-h autocovariance• corr(Xt, Xt+h) is the lag-h autocorrelation• If the autocovariance value depends only on h and not on t, the

ti i i i t titime series is covariance stationary• For covariance stationary time series, the shorthand for lag-h is

used

Notice

),( htth XXcorr +=ρ

Notice• autocorrelation measures the dependence between random

variables that are separated by h-1

Dr. Mesut GüneşChapter 9. Input Modeling 50

Page 51: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Multivariate Input ModelsMultivariate Input Models

If X1 and X2 are normally distributed, dependence between them canIf X1 and X2 are normally distributed, dependence between them can be modeled by the bivariate normal distribution with μ1, μ2, σ1

2, σ22

and correlation ρTo estimate 2 2 see “Parameter Estimation”• To estimate μ1, μ2, σ1

2, σ22, see Parameter Estimation”

• To estimate ρ, suppose we have n independent and identically distributed pairs (X11, X21), (X12, X22), … (X1n, X2n),

• Then the sample covariance is

∑=

−−−

=n

jjj XXXX

nXX

1221121 ))((

11),v(oc

• The sample correlation is

21 ),v(ocˆ XX

Dr. Mesut GüneşChapter 9. Input Modeling 51

21

21

ˆˆ),(

σσρ =

Sample deviation

Page 52: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Multivariate Input Models - ExampleMultivariate Input Models - Example

Let X1 the average lead time to deliver and X2 the annual demand f d tfor a product.Data for 10 years is available.

Lead Time (X1)

Demand (X2)

6,5 103

4,3 83

6,9 116

6,0 9793.9,8.101

02.1,14.6

22

11

==

==

σσ

XX

6,0 97

6,9 112

6,9 104

5,8 106

66.8sample =c

668

Covariance

7,3 109

4,5 92

6,3 96

86.093.902.1

66.8ˆ =⋅

Lead time and demand are strongly dependent.• Before accepting this model, lead time and demand should be checked

individually to see whether they are represented well by normal

Dr. Mesut GüneşChapter 9. Input Modeling 52

individually to see whether they are represented well by normal distribution.

Page 53: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Time-Series Input ModelsTime-Series Input Models

If X1, X2, X3,… is a sequence of identically distributed, but dependentIf X1, X2, X3,… is a sequence of identically distributed, but dependentand covariance-stationary random variables, then we can represent the process as follows:

Autoregressive order 1 model AR(1)• Autoregressive order-1 model, AR(1)• Exponential autoregressive order-1 model, EAR(1)

- Both have the characteristics that:

- Lag-h autocorrelation decreases geometrically as the lag increases, hence,

,...2,1for ,),( === + hXXcorr hhtth ρρ

g g y g , ,observations far apart in time are nearly independent

Dr. Mesut GüneşChapter 9. Input Modeling 53

Page 54: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Time-Series Input Models – AR(1)Time-Series Input Models – AR(1)

Consider the time-series model:Consider the time series model:

,...3,2for ,)( 1 =+−+= − tXX ttt εμφμ2id0ithddi t ib tlli i dh

If initial value X1 is chosen appropriately, then X X are normally distributed with mean = and variance = 2/(1 φ2)

232 varianceand0with ddistributenormally i.i.d.are, , where εε σμεε =…

• X1, X2, … are normally distributed with mean = μ, and variance = σ2/(1-φ2)• Autocorrelation ρh = φh

To estimate φ, μ, σε2 :

1),v(ocφ +tt XXˆ )ˆ1(ˆˆ 222 φ 2

1

ˆ)(

σφ += tt

anceautocovari1theis),v(oc where 1 lag-XX tt +

, ˆ X=μ ,)1( 222 φσσε −=

Dr. Mesut GüneşChapter 9. Input Modeling 54

),( 1 gtt +

Page 55: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

Time-Series Input Models – EAR(1)Time-Series Input Models – EAR(1)

Consider the time-series model:Consider the time series model:

,...3,2for 1yprobabilitwith,

y probabilit with ,

1

1 =⎩⎨⎧

+= − t

-XX

Xtt

tt φεφ

φφ

If X is chosen appropriately then

1y probabilit with ,1⎩ +−X tt φεφ

1 0 and ,1 with ddistributelly exponentia i.i.d. are , , where 32 <≤=… φμεε /λε

If X1 is chosen appropriately, then • X1, X2, … are exponentially distributed with mean = 1/λ• Autocorrelation ρh = φ h , and only positive correlation is allowed.

To estimate φ, λ :

1),v(ocˆˆ tt XX1ˆ2

1

ˆ),v(ocˆ

σρφ +== tt XX

anceautocovari1theis)v(ocwhere lag-XX

,1X

Dr. Mesut GüneşChapter 9. Input Modeling 55

anceautocovari1theis),v(oc where 1 lag-XX tt +

Page 56: 09 Input Modeling - RWTH Aachen University · 2007. 6. 26. · One of the bigggg g pest tasks in solving a real problem • GIGO –Garbage-In-Garbage-Out Raw Data Input Data Output

Computer Science, Informatik 4 Communication and Distributed Systems

SummarySummary

In this chapter, we described the 4 steps in developing input dataIn this chapter, we described the 4 steps in developing input data models:1) Collecting the raw data2) Id tif i th d l i t ti ti l di t ib ti2) Identifying the underlying statistical distribution3) Estimating the parameters4) Testing for goodness of fit

Dr. Mesut GüneşChapter 9. Input Modeling 56