risk analysis using simulation
DESCRIPTION
Process safety course workTRANSCRIPT
-
Risk Analysis using Simulation Page 1 of 29
Risk Analysis using simulation
Introduction Simulation is a process of understanding behavior of a system or
evaluating the various strategies for the operation of the system.
Simulation calculates multiple scenarios of a model by repeatedly sampling values from the probability distributions for the uncertain
variables.
Simulation involves the generation of the artificial history of the system and observation of the artificial history to draw inferences concerning the
operating characteristics of the real system that is presented.
Why Simulation? When it is too costly to do physical studies on the system itself (e.g., trying
alternative layout of a factory, building new facility)
The corresponding analytic models are too complicated to study (e.g. a queuing/transportation network)
When uncertainty, non-stationarity dominates the output results.
Steps in Simulation Model development: System defination, objectives to study, decision
variables, output measures, input variables and parameters.
Data collection: Collect data from the real system, obtain probability distributions of the input parameters by statistical analysis
Model translation: Translate the physical model into computer simulation software.
Simulation runs & Output Data Analysis: Run the simulation upto satisfactory level and use statistical analysis of the ouput data to estimate
the performance measures.
-
Risk Analysis using Simulation Page 2 of 29
Model Verification and Validation: z Verification: Ensuring that the model is free from logical errors. It
does what it is intended to do.
z Validation: Ensuring that the model is a valid representation of the whole system. Model outputs are compared with the real system
outputs.
Optimization/Experimental Design: Analyze alternative strategies on the validated simulation model.
Sensitivity analysis: Analuze the input data sensitivity in performance measure of the system. Cheking model robustmess.
Tools for simulation @Risk Crystal Ball MatLab Minitab Excel Spreadsheet And so on, etc
Simulation using Excel Spreadsheet
Basic Excel Skills To facilitate the understanding of simulation using Excel, it is necessary to be
familiar with the following features of Excel
Copying formula and cell references Function: It is used in performing special calculations in the Spread sheet
cells. Some of the more common functions that are usually used in
simulation models are including:
9 MlN (range) - finds the smallest value in a range of cells. 9 MAX (range) - finds the largest value in a range of cells 9 SUM (range) - finds the sum of values in a range of cells
-
Risk Analysis using Simulation Page 3 of 29
9 AVERAGE (range) - finds the average of the values in a range of cells
9 STDEV(range) -finds the standard deviation for a sample in a range of cells
9 AND (condition 1, condition 2, ) -a logical function that returns TRUE if all conditions are true, and FALSE if not.
9 OR (condition1, conditiott2 . . .) - a logical function that returns TRUE if any condition is true, and FALSE if not.
9 IF (condition, value if true, value if false) -a logical function that returns one value if the condition is true and another if the condition
is false.
9 VLOOKIUP (value, table range column number) -looks up a value in a table.
Charts and graphs Excel has many other functions for statistical, financial and other
applications
Others useful features Split Screen: Division of displayed spreadsheet screen Paste Special: Controlling of direct pasting into the cells Column and Row width: Customization of Column and Row size. Displaying Formulas in Worksheets: Showing actual formula in
spreadsheet
Displaying Grid Lines and Row, and Column Headers for Printing Filling a Range with a Series of Numbers. Comment Boxes
Building Simulation Models in Excel However a good design is essential to user understanding. Any good simulation
model design in any spreadsheet should be included with the following features.
A descriptive title
-
Risk Analysis using Simulation Page 4 of 29
A separate input data section area A separate working space A separate output section that provides the model results. Appropriate formatting such as in currency or comma formatting. Complex calculations should be divided into several cells to minimize the
chances of error and enhance understanding.
Comments should be placed next to formula cells or in comment boxes for explanation, if appropriate.
Example: A Simulation Model for Dave's Candies
Daves Candies is a small family owned business that offers gourmet chocolates
and ice cream fountain service. For special occasions such as Valentines day,
the store must place orders for special packaging several weeks in advance from
their supplier. One product, Valentines day chocolate massacre, is bought for
$7,50 a box and sells for $12.00. Any boxes that are not sold by February 14 are
discounted by 50% and can always be sold easily. Historically Daves candies
has sold between 40-90 boxes each year with no apparent trend. Daves
dilemma is deciding how many boxes to order for the Valentines day customers.
If the order quantity, Q is 70, what is the expected profit?
Formulation: Selling price=$12
Cost = $7.50
Discount price=$6
| If *DQ Profit=selling price*Q-cost*Q................................................[2]
*D=Demand & Q=Quantity
-
Risk Analysis using Simulation Page 5 of 29
Simulation model in Excel
Model description
The input data are given in Column A and Column B. Simulation results are shown in Columns D through F. IF function is used in Column F to calculate the profit based on order
demand and quantity relation using the equations [1] and [2].
-
Risk Analysis using Simulation Page 6 of 29
Each row in the results table represents one trial of the simulation. Demand in each row was generated from probability distribution. Since selling boxes range 40-90 boxes/year, the values in Column E were
generated by rolling a die and entered into the worksheet to verify the
formulas for profit in column F.
Probability & Statistic in Simulation
Fundamentals Probability - Probability is a measure of how likely a value or event is to
occur. It can be measured from simulation data as frequency by
calculating the number of occurrences of the value or event divided by the
total number of occurrences. This calculation returns a value between 0
and 1 which then can be converted to percentage by multiplying by 100.
Probabilistic Risk Assessment (PRA) - A risk assessment that yields a probability distribution for risk, generally by assigning a probability
distribution to represent variability or uncertainty in one or more inputs to
the risk equation.
Probability Density Function (PDF) - A function representing the probability distribution of a continuous random variable. The density at a
point refers to the probability that the variable will have a value in a narrow
range about that point.
Probability Distribution - A probability distribution is a set of probabilities associated with all possible outcomes of an uncertain event.
Probability Mass Function (PMF) - A function representing the probability distribution for a discrete random variable. The mass at a point
refers to the probability that the variable will have a value at that point.
Random Variable - A variable that may assume any value from a set of values according to chance. Discrete random variables can assume only a
finite or countable infinite number of values (e.g., number of rainfall events
per year). A random value is continuous if its set of possible values is an
entire interval of numbers (e.g., quantity of rain in a year).
-
Risk Analysis using Simulation Page 7 of 29
Central Tendency Exposure (CTE) - A risk descriptor representing the average or typical individual in a population, usually considered to be the
mean or median of the distribution.
Confidence Interval - Confidence intervals refers an interval that provides a range where the exact or true probability value of parameters may lie.
Confidence Limit - The upper or lower value of a confidence interval. Cumulative Distribution Function (CDF) - Obtained by integrating the
PDF, gives the cumulative probability of occurrence for a random
independent variable. Each value c of the function is the probability that a
random observation will be less than or equal to c.
Frequency Distribution or Histogram - A graphic (plot) summarizing the frequency of the values observed or measured from a population. It
conveys the range of values and the count (or proportion of the sample)
that was observed across that range.
Monte Carlo Analysis (MCA) or Monte Carlo Simulation - A technique for characterizing the uncertainty and variability in risk estimates by
repeatedly sampling the probability distributions of the risk equation inputs
and using these inputs to calculate a range of risk values.
Point Estimate - In statistical theory, a quantity calculated from values in a sample to estimate a fixed but unknown population parameter. Point
estimates typically represent a central tendency or upper bound estimate
of variability. [Source: U.S. EPA]
Types of Distribution: Continuous distribution: A probability distribution is continuous when
any value between the minimum and maximum is possible (has finite
probability). For example, an uncertainty function describing the possible
annual rainfall in Ithaca, New York next year would be a continuous
function since any value between 0 and some upper limit is possible.
-
Risk Analysis using Simulation Page 8 of 29
Discrete distribution: A discrete probability distribution has only a finite number of possible values between the maximum and minimum. For
example, an uncertainty function describing the outcome of a coin toss is
discrete since only two values are possible: heads or tails.
Descriptive Statistics
Parameter - A value that characterizes the distribution of a random variable.
Parameters commonly characterize the location, scale, shape, or bounds of the
distribution. For example, a truncated normal probability distribution may be
defined by four parameters: arithmetic mean [location], standard deviation
[scale], and min and max [bounds]. It is important to distinguish between a
variable (e.g., ingestion rate) and a parameter (e.g., arithmetic mean ingestion
rate).
Measures of Central Tendency: Mean, Median, & Mode
Mean: The mean of a set of values is the sum of all the values in the set divided by the total number of values in the set.
=
=N
ii
xN 11
Median: The median is the middle data point when the data is ordered. The middle value (50th percentile) in the ordered sequence of measured
values. For highly skewed data sets the median can give a better
-
Risk Analysis using Simulation Page 9 of 29
representation than the mean of the middle of the data distribution. The
median is not as comprehensive a measure of the data set as the mean.
Mode: The most likely value or mode is the value that occurs most often in a set of values. In a histogram and a result distribution, it is the center
value in the class or bar with the highest probability.
Measures of Variation
Standard deviation: The standard deviation is a measure of how widely dispersed the values are in a distribution. Equals the square root of the
variance.
1
)(1
2
==
N
xN
ii
Variance: The variance is a measure of how widely dispersed the values are in a distribution, and thus is an indication of the "risk" of the
distribution. It is calculated as the average of the squared deviations about
the mean. The variance gives disproportionate weight to "outliers", values
that are far away from the mean. The variance is the square of the
standard deviation.
Range: The range is the absolute difference between the maximum and minimum values in a set of values. The range is the simplest measure of
the dispersion or "risk" of a distribution
Kurtosis: Kurtosis is a measure of the shape of a distribution. Kurtosis indicates how flat or peaked the distribution is. The higher the kurtosis
value, the more peaked the distribution. The coefficient of kurtosis is
computed as:
41
4)(
=
=N
iix
CK
-
Risk Analysis using Simulation Page 10 of 29
Skewness: Skewness is a measure of the shape of a distribution. Skewness indicates the degree of asymmetry in a distribution. Skewed
distributions have more values to one side of the peak or most likely value
one tail is much longer than the other. A skewness of 0 indicates a
symmetric distribution, while a negative skewness means the distribution
is skewed to the left. Positive skewness indicates a skew to the right. The
coefficient of skewness is computed as:
31
3)(
=
=N
iix
CS
Coefficient of Variation: It is the ratio of standarad deviation and the sample mean.
=CV
Correlation: It is measure the interdependences of two variables. The sample correlation coefficient is computed as:
yx
N
iii
SSN
yyxxr
)1(
))((1
==
Random numbers & Probability distribution in Simulation It is a numerical description of the outcome of some experiment. Example
of outcome of experiment could be profit values, failure rate or chance of
an event occurrence, etc.
It can generated from a probability distribution. In simulation terminology, a random number is one that is uniformly
distributed in between 0 and 1.
Recall from statistic that uniform probability distribution characterizes a random variable for which all outcomes between a minimum value a and a
maximum value b are equally likely.
-
Risk Analysis using Simulation Page 11 of 29
Some Important Continuous Distribution Exponential Distribution: The exponential distribution is characterized
by the single parameter (.) The exponential random variable T (t > 0) has a probability density function
f(t) = e-t for t > 0
Where,
= mean number of occurrences per unit time t= number of time units until the next occurrence The cumulative distribution function is
00
( ) 1t
tx x tF t e dx e e = = = for t > 0. Mean: = and
1
Variance: 212
=
Lognormal Distribution: Distribution parameter (, )
-
Risk Analysis using Simulation Page 12 of 29
-
Risk Analysis using Simulation Page 13 of 29
Normal Distribution: Distribution parameter (, )
-
Risk Analysis using Simulation Page 14 of 29
Triangular Distribution : Distribution Parameter (min, m.likely, max)
-
Risk Analysis using Simulation Page 15 of 29
-
Risk Analysis using Simulation Page 16 of 29
Uniform Distribution : Distribution parameter (min, max)
-
Risk Analysis using Simulation Page 17 of 29
-
Risk Analysis using simulation Page 18 of 29
Some Important Discrete Distribution Binomial Distribution: Distribution parameter (n, p)
-
Risk Analysis using simulation Page 19 of 29
-
Risk Analysis using simulation Page 20 of 29
Poisson distribution: Distribution parameter ()
-
Risk Analysis using simulation Page 21 of 29
Generating random variables in Excel Two properties of probability distribution are useful to generate random numbers:
1. the probability of any outcome is always between 0 and 1 and
2. the sum of the probabilities of all outcomes adds to 1
Random numbers from discrete distribution
Divide the range from 0 to 1 into intervals that correspond to the probabilities of the discrete outcomes.
Use VLOOKUP function (look up value, table array, column index number) or Random number generation function in excel.
-
Risk Analysis using simulation Page 22 of 29
Example : Random number generation for Dave's Candies
-
Risk Analysis using simulation Page 23 of 29
Random numbers from continuous distribution
Inverse transform method is used in case if there is no direct function to generate random numbers.
Inverse transform method uses the following steps: 1. generate a random number from the Uniform distribution:
u=Uniform(0,1),
2. Calculate inverse cumulative distribution function (ICDF). In Excel, the
function name to calculate the ICDF for Normal distribution NORMINV
and for LogNormal distribution is Lognormal (LOGINV). RAND function
in excel can also be used to generate random numbers from the Uniform
distribution, and apply the built-in functions to calculate the ICDF.
Example: For example, the following formula will return the inverse CDF of the Normal
distribution with mean=1 and standard_deviation=2 evaluated at p=0.2:
=NORMINV(0.2; 1; 2)
Replacing 0.2 with RAND will yield the Normal random number generation formula:
=NORMINV(RAND(); 1; 2)
Similarly for LogNormal distribution the function is
LOGINV (probability, mean, Standard_dev)
For triangular distribution, It may need to use If function to generate random numbers
in Excel. The structure of the IF function is:
=IF(expression, what is returned if true, what is returned if false)
If function first determines which side of the distribution corresponds to the random
number and then evaluates the appropriate formula.
=IF (RAND () < (Mode Min)/(Max Min),Left formula, Right formula)
Let assume,
In a triangular distribution, if Min is a, Mode is b and Max. is c, then
bxaacabRANDaFormulaLeft += )(*)(*() cxbacbcRANDcFormulaRight = )(*)(*())1(
-
Risk Analysis using simulation Page 24 of 29
Monte-Carlo Simulation
Figure 1: Monte Carlo analysis to a model.
Monte-Carlo simulation uses the following steps for assessing parameters uncertainty:
i. Select a distribution to describe possible values of a parameter.
ii. Generate data from this distribution.
iii. Use the generated data as probable values of the parameters in the model to
produce output.
Monte-Carlo simulation on Excel Monte-Carlo simulation on excel spreadsheet follows the following steps::
1. Develop the spreadsheet model including a separate input and output
region.
2. Generate random numbers from the assigned probability distribution and
use the random data into the appropriate formula in the simulation model.
3. Repeat step 2 until to obtain the satisfactory output to create a distribution of
results.
-
Risk Analysis using simulation Page 25 of 29
4. Make a summary for the descriptive statistics and collect output data in a
frequency distribution or histogram for analysis.
Example: Monte-Carlo simulation on excel for Dave's Candies problem
-
Output
-
-
Example: Methyl-mercury has been inadvertently released to a
Monte Carlo simulation. With 90% (subjective) confidence, what
maximally exposed individual? The following table shows the (sub
distributions for this problem.
nearby lake. Use
is the risk to the
jective) probability
-
Risk Analysis using simulation Page 26 of 29
Solution: Step 1: Selecting distribution for model parameters 9 Table 1 is used as input for a Monte Carlo simulation
Table 1: Input data for MCS
Input Region Parameter Distribution Min. Max. Mean/Mode STD Fish concentration(CF), mg/kg Normal 2.06E-01 4.22E-02
Intake of Fish (IF),kg/d Uniform 2.00E-02 1.30E-01 6.50E-02
Methyl mercury RfD (RfDMM), mg/kg-d
Triangle 1.50E-04 3.00E-03 3.00E-04
Body mass (BM), kg Triangle 4.50E+01 1.20E+02 7.00E+01
Step 2: Generate data from this distribution.
9 Table 3 shows the generated dated for this example: Table 3: Random number generation
Sample Number CF IF RfD BM
1 0.209 0.032 5.39E-04 59.203 2 0.268 0.047 9.61E-04 66.066 3 0.203 0.052 2.35E-03 109.907 4 0.199 0.026 2.51E-04 95.667 5 0.140 0.123 3.71E-04 94.443 6 0.230 0.083 6.53E-04 60.822 7 0.191 0.025 2.77E-03 78.394 8 0.182 0.102 1.00E-03 80.310 9 0.197 0.095 1.36E-03 57.432
10 0.220 0.041 6.73E-04 84.090 11 0.218 0.099 7.61E-04 59.938 12 0.198 0.024 1.57E-03 88.117 - - - - - - - - - - - - - - -
96 0.265 0.098 2.90E-04 90.694 97 0.213 0.109 4.57E-04 84.283 98 0.233 0.113 1.55E-03 64.637 99 0.265 0.070 4.98E-04 93.816
100 0.161 0.066 6.63E-04 87.898
-
Risk Analysis using simulation Page 27 of 29
Step 3: Use the generated data as probable values in the model to calculate HQ.
9 Used model to calculate the Hazard Index (HI):
RfDBMICHI FF
=
Table 3: MCS to calculate HI using 100 iterations
Sample Number CF IF RfD BM HI
1 0.209 0.032 5.39E-04 59.203 2.12E-01 2 0.268 0.047 9.61E-04 66.066 1.97E-01 3 0.203 0.052 2.35E-03 109.907 4.07E-02 4 0.199 0.026 2.51E-04 95.667 2.13E-01 5 0.140 0.123 3.71E-04 94.443 4.92E-01 6 0.230 0.083 6.53E-04 60.822 4.79E-01 7 0.191 0.025 2.77E-03 78.394 2.16E-02 8 0.182 0.102 1.00E-03 80.310 2.32E-01 9 0.197 0.095 1.36E-03 57.432 2.40E-01
10 0.220 0.041 6.73E-04 84.090 1.59E-01 11 0.218 0.099 7.61E-04 59.938 4.71E-01 12 0.198 0.024 1.57E-03 88.117 3.48E-02 13 0.142 0.037 2.05E-03 64.939 3.92E-02 14 0.268 0.099 1.83E-03 87.285 1.66E-01 15 0.217 0.070 2.17E-03 76.220 9.23E-02 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
94 0.216 0.119 2.01E-03 69.327 1.85E-01 95 0.223 0.078 2.68E-03 74.652 8.68E-02 96 0.265 0.098 2.90E-04 90.694 9.83E-01 97 0.213 0.109 4.57E-04 84.283 6.04E-01 98 0.233 0.113 1.55E-03 64.637 2.63E-01 99 0.265 0.070 4.98E-04 93.816 3.95E-01
100 0.161 0.066 6.63E-04 87.898 1.83E-01
-
Risk Analysis using simulation Page 28 of 29
Output Summary: The output from MCS is shown in Fig 1. The frequency distribution for the output from MCS (Monte Carlo simulation) is
shown in Fig 2.
From Monte Carlo simulation, the confidence interval for 90% & 95% are obtained [0.193, 0.251] & [0.188, 0.256] respectively.
This implies that after taking into account the uncertainties on the parameters, one is highly confident (at a subjective level of 95%) that the true HI should lie
between 0.188 and 0.256.
Since the 95% upper confidence limit of HI is still below 1, there is high confidence that the maximally exposed individual for this scenario is not
exposed to an unacceptable level of risk, and remediation should not be
warranted.
Figure 1: MCS output
-
Risk Analysis using simulation Page 29 of 29
Forcast: Hazard Index (HI)
0
5
10
15
20
25
0.01
0.11
0.21
0.31
0.41
0.51
0.61
0.71
0.81
0.91
1.01
Hazard Index
Freq
uenc
Figure 2: Frequency distribution of HI
References: Hammonds, J. S. , Hoffman, F. O., and Bartell, S .M., An Introductory Guide to
Uncertainty Analysis in Environmental and Health Risk Assessment managed by
MARTIN MARIETTA ENERGY SYSTEMS, INC. for the U.S. DEPARTMENT OF
ENERGY
James R. Evans, David Louis Olson, Introduction to Simulation and Risk Analysis,
Prentice Hall PTR, Upper Saddle River, NJ, 2001
http://www.cas.lancs.ac.uk/glossary_v1.1/prob.html#pdf
http://www.ltcconline.net/greenl/courses/201/probdist/zScore.htm
Risk Analysis using simulation