latin hypercube sampling for uncertainty analysis in multiphase

10
Latin hypercube sampling for uncertainty analysis in multiphase modelling Amir Ali Khan, Leonard Lye, and Tahir Husain Abstract: To facilitate the uncertainty analysis of a finite element multiphase multi-component transport model MOFAT, this paper provides guidance on latin hypercube sampling Monte Carlo (LHS-MC) sample size selection. To evaluate the ability of LHS-MC to produce output cumulative distribution functions (cdfs) that replicate random sampling Monte Carlo (RS-MC) cdfs, output cdfs obtained with LHS-MC sample sizes of 100, 300, and 500, and a RS-MC sample size of 10 000 are compared using the two sample Kolmogorov–Smirnov test. The LHS-MC cdfs for the three different sample sizes are able to accurately replicate the corresponding RS-MC cdfs for benzene, toluene, ethylbenzene, and xylene (BTEX) concentrations in the water, gas, and solid phases. The stability of LHS-MC is also evaluated by comparing three replicates of a LHS-MC sample. The three replicates are all able to accurately replicate the corresponding RS-MC cdfs for all BTEX concentrations in all three phases. Key words: uncertainty analysis, Monte Carlo, latin hypercube sampling, sample size, MOFAT, multiphase, multi-compo- nent. Re ´sume ´: Dans le but de faciliter l’analyse d’incertitude d’un mode `le MOFAT de transport a `e ´le ´ments finis polyphase ´ et a ` multicomposantes, cet article se veut un guide pour la se ´lection de la dimension de l’e ´chantillonnage latin hypercube Monte Carlo (LHS-MC). Pour e ´valuer la capacite ´ du LHS-MC a ` produire des fonctions de distribution cumulative (cdfs) qui reproduit les cdfs de l’e ´chantillonnage ale ´atoire Monte Carlo (RS-MC), les cdfs de sortie obtenues avec des tailles d’e ´chantillons LHS-MC de 100, 300 et 500 ainsi qu’un e ´chantillon RS-MC de taille 10,000 sont compare ´es en utilisant les tests de Kolmogoroff-Smirnoff a ` deux e ´chantillons. Les cdfs LHS-MC pour les trois diffe ´rentes tailles d’e ´chantillons peu- vent reproduire avec pre ´cision les cdfs RS-MC correspondantes pour les concentrations de benze `ne, de tolue `ne, d’e ´thyl- benze `ne et de xyle `ne (BTEX) dans les phases liquides, gazeuses et solides. La stabilite ´ du LHS-MC est e ´galement e ´value ´e en comparant trois re ´plicats d’un LHS-MC. Les trois re ´plicats peuvent tous reproduire pre ´cise ´ment les cdfs RS-MC corres- pondantes pour toutes les concentrations de BTEX dans les trois phases. Mots-cle ´s : analyse d’incertitude, Monte Carlo, e ´chantillonnage latin hypercube, taille de l’e ´chantillon, MOFAT, poly- phase ´, multicomposantes. [Traduit par la Re ´daction] Introduction The description and characterization of uncertainty intro- duced into the modelling process by parametric variability is an important component of any Tier 3 Risk Based Correc- tive Action (RBCA) of petroleum contaminated sites. Uncer- tainty introduced in fate and transport modelling outputs due to parametric variability in model inputs needs to be charac- terized since any uncertainty associated with the exposure modelling process is eventually propagated through the risk assessment computations. Despite the availability of many techniques for uncer- tainty analysis, due to the complexity of the fate and trans- port models used in Tier 3 RBCA, the applicability of these techniques is model specific and requires an evaluation on a model-by-model basis. Unfortunately, such model specific guidance on the applicability and efficiency of various un- certainty and sensitivity analysis methods is not available for most Tier 3 RBCA fate and transport models. This ab- sence of model specific guidance has resulted in a state of affairs that has been aptly described by Saltelli et al. (2004) as being one of where ‘‘uncertainty and sensitivity analysis are more often mentioned than practiced’’. To facilitate the practice of uncertainty analysis, this pa- per evaluates the ability of the latin hypercube sampling Monte Carlo (LHS-MC) technique to replicate the results of the more exhaustive random sampling based Monte Carlo (RS-MC) technique for the United States Environmental Protection Agency’s finite element multiphase multi-compo- nent transport model MOFAT. MOFAT is used for the fate and transport modelling of petroleum release models. It has been identified as a Tier 3 RBCA model in the American Society for Testing and Material’s ‘‘RBCA Fate and Trans- Received 5 February 2007. Revision accepted 15 July 2008. Published on the NRC Research Press Web site at jees.nrc.ca on 1 October 2008. A.A. Khan. 1 Hydrologic Modelling Section, Department of Environment and Conservation, Government of NL, West Block, Fourth Floor, PO Box 8700, St. John’s, NL A1B 4J6, Canada. L. Lye and T. Husain. Faculty of Engineering and Applied Science, Memorial University of Newfoundland, St. John’s, NL A1B 3X5, Canada. Written discussion of this article is welcomed and will be received by the Editor until 31 January 2009. 1 Corresponding author (e-mail: [email protected]). 617 J. Environ. Eng. Sci. 7: 617–626 (2008) doi:10.1139/S08-031 # 2008 NRC Canada

Upload: others

Post on 03-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Latin hypercube sampling for uncertainty analysisin multiphase modelling

Amir Ali Khan, Leonard Lye, and Tahir Husain

Abstract: To facilitate the uncertainty analysis of a finite element multiphase multi-component transport model MOFAT,this paper provides guidance on latin hypercube sampling Monte Carlo (LHS-MC) sample size selection. To evaluate theability of LHS-MC to produce output cumulative distribution functions (cdfs) that replicate random sampling Monte Carlo(RS-MC) cdfs, output cdfs obtained with LHS-MC sample sizes of 100, 300, and 500, and a RS-MC sample size of10 000 are compared using the two sample Kolmogorov–Smirnov test. The LHS-MC cdfs for the three different samplesizes are able to accurately replicate the corresponding RS-MC cdfs for benzene, toluene, ethylbenzene, and xylene(BTEX) concentrations in the water, gas, and solid phases. The stability of LHS-MC is also evaluated by comparing threereplicates of a LHS-MC sample. The three replicates are all able to accurately replicate the corresponding RS-MC cdfs forall BTEX concentrations in all three phases.

Key words: uncertainty analysis, Monte Carlo, latin hypercube sampling, sample size, MOFAT, multiphase, multi-compo-nent.

Resume : Dans le but de faciliter l’analyse d’incertitude d’un modele MOFAT de transport a elements finis polyphase eta multicomposantes, cet article se veut un guide pour la selection de la dimension de l’echantillonnage latin hypercubeMonte Carlo (LHS-MC). Pour evaluer la capacite du LHS-MC a produire des fonctions de distribution cumulative (cdfs)qui reproduit les cdfs de l’echantillonnage aleatoire Monte Carlo (RS-MC), les cdfs de sortie obtenues avec des taillesd’echantillons LHS-MC de 100, 300 et 500 ainsi qu’un echantillon RS-MC de taille 10,000 sont comparees en utilisant lestests de Kolmogoroff-Smirnoff a deux echantillons. Les cdfs LHS-MC pour les trois differentes tailles d’echantillons peu-vent reproduire avec precision les cdfs RS-MC correspondantes pour les concentrations de benzene, de toluene, d’ethyl-benzene et de xylene (BTEX) dans les phases liquides, gazeuses et solides. La stabilite du LHS-MC est egalement evalueeen comparant trois replicats d’un LHS-MC. Les trois replicats peuvent tous reproduire precisement les cdfs RS-MC corres-pondantes pour toutes les concentrations de BTEX dans les trois phases.

Mots-cles : analyse d’incertitude, Monte Carlo, echantillonnage latin hypercube, taille de l’echantillon, MOFAT, poly-phase, multicomposantes.

[Traduit par la Redaction]

Introduction

The description and characterization of uncertainty intro-duced into the modelling process by parametric variabilityis an important component of any Tier 3 Risk Based Correc-tive Action (RBCA) of petroleum contaminated sites. Uncer-tainty introduced in fate and transport modelling outputs dueto parametric variability in model inputs needs to be charac-terized since any uncertainty associated with the exposure

modelling process is eventually propagated through the riskassessment computations.

Despite the availability of many techniques for uncer-tainty analysis, due to the complexity of the fate and trans-port models used in Tier 3 RBCA, the applicability of thesetechniques is model specific and requires an evaluation on amodel-by-model basis. Unfortunately, such model specificguidance on the applicability and efficiency of various un-certainty and sensitivity analysis methods is not availablefor most Tier 3 RBCA fate and transport models. This ab-sence of model specific guidance has resulted in a state ofaffairs that has been aptly described by Saltelli et al. (2004)as being one of where ‘‘uncertainty and sensitivity analysisare more often mentioned than practiced’’.

To facilitate the practice of uncertainty analysis, this pa-per evaluates the ability of the latin hypercube samplingMonte Carlo (LHS-MC) technique to replicate the results ofthe more exhaustive random sampling based Monte Carlo(RS-MC) technique for the United States EnvironmentalProtection Agency’s finite element multiphase multi-compo-nent transport model MOFAT. MOFAT is used for the fateand transport modelling of petroleum release models. It hasbeen identified as a Tier 3 RBCA model in the AmericanSociety for Testing and Material’s ‘‘RBCA Fate and Trans-

Received 5 February 2007. Revision accepted 15 July 2008.Published on the NRC Research Press Web site at jees.nrc.ca on1 October 2008.

A.A. Khan.1 Hydrologic Modelling Section, Department ofEnvironment and Conservation, Government of NL, West Block,Fourth Floor, PO Box 8700, St. John’s, NL A1B 4J6, Canada.L. Lye and T. Husain. Faculty of Engineering and AppliedScience, Memorial University of Newfoundland, St. John’s, NLA1B 3X5, Canada.

Written discussion of this article is welcomed and will bereceived by the Editor until 31 January 2009.

1Corresponding author (e-mail: [email protected]).

617

J. Environ. Eng. Sci. 7: 617–626 (2008) doi:10.1139/S08-031 # 2008 NRC Canada

port Models: Compendium and Selection Guidance’’ (ASTM1999).

To further provide appropriate guidance to MOFAT userson LHS-MC sample size selection, this paper evaluates theability of different LHS-MC sample sizes to produce cumu-lative distribution functions (cdfs) that replicate correspond-ing RS-MC cdfs.

Practitioners of uncertainty analysis also need guidanceon the stability of LHS-MC (i.e., do estimates of uncertaintychange significantly with different LHS-MC replicates ofthe same sample size?). An LHS-MC replicate is a LHS-MC sample generated using a different random seed. Thispaper tests the stability of LHS-MC replicates by evaluatingif different LHS-MC replicates of the same sample size areable to produce cdfs that accurately replicate correspondingRS-MC cdfs.

Uncertainty analysis techniques

The benchmark technique for uncertainty analysis is RS-MC, which refers to the traditional method of sampling ran-dom variables in simulation modelling. RS-MC simulation isa robust technique and is one of the most widely used meth-odologies to account for parameter variability in ground-water flow and contaminant transport. Amongst thestrengths of RS-MC are that it is easy to program and apply;amenable to analytical and numerical models; and producesunbiased estimates of the mean and variance of the outputvariables (Saltelli et al. 2004).

A major disadvantage of RS-MC is that it is computation-ally intensive and for long running models the total simula-tion time may be in itself prohibitive. For complex transportproblems in large heterogeneous domains (Freeze et al. 1990)or for high order systems (Helton and Davis 2000) RS-MCanalysis may be computationally prohibitive especially sinceit is important to ensure that the RS-MC simulations preservethe input probability distributions by exhaustively samplingfrom various points of the input distributions. The number ofsimulations used in a RS-MC uncertainty analysis is modeland problem specific and may be as high as 10 000 simula-tions. For most RBCA applications conducting a large num-ber of simulations is not practical. This is especially so if theuncertainty analysis of different scenarios needs to be under-taken as part of the RBCA analysis.

There are various modifications of the RS-MC techniquethat have been developed over the years with the aim of re-ducing the computational effort. These modifications workon modifying the sampling procedure. One such modifiedsampling technique is LHS-MC. LHS-MC was developedby McKay et al. (1979). It uses stratified sampling withoutreplacement to reduce variance (Helton and Davis 2003). Adetailed description of the method and its application for un-certainty analysis of complex systems can be found in Hel-ton and Davis (2003). The LHS-MC technique forces thesampling to select values over the whole range of a modelparameter, thereby reducing the total number of samples re-quired to preserve the probability distributions. This signifi-cantly improves the computational efficiency of theuncertainty analysis and consequently LHS-MC is generallyrecommended over RS-MC when the model is complex or

when time and resource constraints are an issue (USEPA1997).

When the output is a monotonic function of its inputs,LHS-MC is proven to be better than RS-MC in describingthe mean and the population distribution function (McKayet al. 1979; Campolongo et al. 2000; Helton and Davis2003). LHS-MC is better than RS-MC in that it provides anestimator (of the expectation of the output function) withlower variance. The closer the output function is to beingadditive (i.e., linear) in its input quantities, the greater is thereduction in variance (Stein 1987; Campolongo et al. 2000;Helton and Davis 2003; Saltelli et al. 2004). Another aspectof LHS-MC is that it performs better than RS-MC when theoutput is dominated by a few components of the input fac-tors (Campolongo et al. 2000; Saltelli et al. 2004).

Modelling approach

Multiphase and multi-component transport modelMOFAT simulates flow only or coupled multiphase flow

and multi-component transport in planar or radially symmet-ric vertical sections. The flow module can be used to ana-lyze two-phase flow of water and non aqueous phase liquid(NAPL) or explicit three-phase flow of water, NAPL andgas at variable pressure. The transport module can handleup to five non-inert chemical components that partitionamong water, NAPL, gas, and solid phases. The transportequations are solved serially with the flow equations. Gov-erning equations are solved using an efficient upstream-weighted finite element scheme. MOFAT achieves a highdegree of computational efficiency by using an adaptive sol-ution domain algorithm that confines the mathematical solu-tion domain to a sub-domain within which transient oil flowoccurs. Three phase permeability–saturation–capillary pres-sure relations are defined by an extension of the Van Gen-uchten model, which considers effects of oil entrapmentduring periods of water imbibition (Katyal et al. 1991). Themodel accounts for non equilibrium phase partitioningthrough the use of apparent partition coefficients in lieu ofequilibrium coefficients. In non equilibrium phase partition-ing, for any two phases that are in physical contact, the rateof mass transfer is described by first order mass transferfunctions (Katyal et al. 1991).

Required input for flow analyses consists of initial condi-tions, soil hydraulic properties, fluid properties, time integra-tion parameters, boundary condition data, and meshgeometry. For transport analyses, additional input data areporous media dispersivities, initial water phase concentra-tions, equilibrium partition coefficients, component den-sities, diffusion coefficients, first-order decay coefficients,mass transfer coefficients (for non equilibrium analyses)and boundary condition data. Table 1 presents the soil, com-ponent and bulk fluid parameters that are required by MO-FAT for flow and transport analyses.

Program output consists of basic information on input pa-rameters, mesh details and initial conditions plus pressureheads, saturation and velocities for each phase at everynode for specified output intervals. For transport analyses,the component concentrations in each phase at each nodeare output at each printout interval.

618 J. Environ. Eng. Sci. Vol. 7, 2008

# 2008 NRC Canada

Soil parameter variabilityDue to the inherent spatial heterogeneity of the soil me-

dia, soil properties are the primary source of parameter vari-ability in MOFAT inputs. The bulk fluid and componentproperties are generally not a major source of variability asthey are mostly constants by definition.

Consequently, this study focused on variability in theseven soil input parameters required for flow analysis asshown in Table 1. The bulk fluid and component propertieswere modeled as deterministic inputs. The intention of theuncertainty study was to demonstrate a methodology withgeneral applicability so all the site-specific parameters(first-order decay rate coefficients, diffusion coefficientsand dispersitivities) that could not be estimated or predictedfor a previously unstudied field situation were modeled asbeing equal to zero. The soil medium was assumed to beisotropic, i.e., the hydraulic conductivity in the vertical andthe horizontal directions were assumed to be equal.

In keeping with the intent to demonstrate a methodologywith general applicability, soil parameter probabilities andthe correlation structure between the selected seven parame-ters were defined using a soil property database publishedby Carsel and Parrish (1988). This database has been com-piled using Soil Conservation Service (SCS) soil survey in-formation reports from 42 states and has previously beenused to characterize input parameters for a RS-MC uncer-tainty analysis of pesticide leaching using the unsaturatedzone pesticide root zone model (Carsel and Parrish 1988).

Joint probability distributions published by Carsel andParrish (1988) were used to model saturated conductivity towater, Ksw, residual water content, �r, Van Genuchten air–water capillary retention parameter, �, and Van Genuchtenair–water capillary retention parameter, n. Table 2 summa-rizes the probability distributions and the distribution param-eters used to describe the variability of Ksw , �r, �, and n forthe SCS textural classification ‘‘loam’’. The ‘‘loam’’ soilclassification was selected as an illustration for this study.The values of Ksw, �r, �, and n are listed as transformed inTable 2 as Carsel and Parrish (1988) used the Johnson fam-ily of distributions to convert Ksw, �r, �, and n to normal

distributions. Carsel and Parrish (1988) did this to computePearson product-moment correlations and covariances forthe transformed variables.

The apparent irreducible water saturation, Sm, was calcu-lated using the maximum water content, �m, and residualwater content, �r, information from Carsel and Parrish(1988) as follows (Katyal et al. 1991):

½1� Sm ¼�r

�m

The maximum residual oil saturation for water, Sor, andsoil porosity, �, were modeled as uniform distributionsbased on the ranges presented in the user’s manual for MO-FAT (Katyal et al. 1991).

The Pearson product-moment correlations for the ‘‘loam’’SCS soil textural classification are presented in Table 3. ThePearson product-moment correlation coefficient is a measureof the correlation between two variables. It ranges from +1to –1. A correlation of +1 indicates a perfect positive linearcorrelation between variables, while a correlation of –1 indi-cates a perfect negative linear correlation between variables.A correlation of 0 means there is no linear correlation be-tween the two variables. Table 3 shows that Ksw is positivelycorrelated with �r, �, and n. The correlations between Kswand � and n are especially strong. �r shows a strong negativecorrelation with n. It is also weakly negatively correlatedwith �. � and n show a strong positive correlation.

Simulation setupThe site scenario used for this study is a hypothetical one

based on site scenarios used in MOFAT validation studiesby Parker (1989), Kaluarachchi and Parker (1989), and Ka-luarachchi and Parker (1990). The scenarios used by Sleepand Sykes (1989) to model the transport of volatile organicsin variably saturated media were also part of the comparisongroup used to develop the site scenario. A scenario based onsite scenarios used in MOFAT validation studies was usedto keep model error at a minimum by using the modelwithin its validation context.

The physical domain simulated is a 24 m long vertical

Table 1. Soil, component and bulk fluid input parameters for MOFAT.

Property Used in MOFAT to model Input parameterSoil Flow Saturated conductivity to water in the vertical direction, Kswz

Saturated conductivity to water in the horizontal direction, Kswx

Soil porosity, �Apparent irreducible water saturation, Sm

Maximum residual oil saturation for water, Sor

Van Genuchten air–water capillary retention parameter, �Van Genuchten air–water capillary retention parameter, n

Transport Longitudinal dispersivity, AL

Transverse dispersivity, AT

Non-equilibrium phase partitioning Non equilibrium coefficients, koa, kwa, kow, and kws

Component Transport The diffusion coefficients in bulk water, oil and air (Do�w

, Do�o

, Do�a

)The oil–water, air–water, and solid–water partition coefficients (��o, ��a, and ��s)The first-order decay coefficients (��w, ��o, ��a, and ��s)

Bulk fluid Flow The scaling coefficients, �ao and �ow

Specific gravities of the NAPL, rro and gas phasesRelative viscosities of the NAPL and gas phases

Khan et al. 619

# 2008 NRC Canada

slice through an aquifer with a distance of 10 m from thesoil surface to the aquifer bottom. As depicted in Fig. 1, itis represented by a finite element mesh of 1029 nodes withan inter-nodal spacing of 0.5 m. A fine grid resolution, asopposed to a coarse grid resolution, was adopted to mini-mize approximations and uncertainties in the model results.

A water table occurs at a depth of 5 m on the left boun-dary and a depth of 3 m on the right boundary, which ismaintained throughout the simulation, resulting in continu-ous groundwater flow to the right.

A hydrocarbon spill on a 4 m wide strip source at theupper surface is simulated by permitting infiltration of a pre-scribed volume of 1 m3 under a head of 0.1 m. Once the hy-drocarbon has infiltrated into the soil (infiltration stage), theinter-phase mass transfer and transport (re-distribution stage)of the hydrocarbon is simulated while subject to zero boun-dary flux. The hydrocarbon simulated is a benzene, toluene,ethylbenzene, and xylene (BTEX) mixture consisting ofequal volumes of each component.

The results were evaluated at an X coordinate of 14 m anda Y coordinate of 8.5 m after a simulation time of 3.1 d. Theorigin (coordinates 0 m, 0 m) of the coordinate system is thelower left corner of the grid.

SimulationsUncertainty analysis techniques are evaluated for their

ability to produce output cdfs that replicate correspondingRS-MC output cdfs. They are also evaluated from the per-spective of computational efficiency.

A sample size of 10 000 was chosen for the RS-MC sim-ulations to ensure that the random samples had adequately

sampled the complete range of the input probabilities. Theadequacy of the sample size was confirmed by comparingthe mean and standard deviation for all the simulation out-puts after each simulation with the same statistics for theprevious simulation. The mean and standard deviation ofRS-MC simulation outputs stop changing when the RS-MChas adequately sampled the complete range of the inputprobability distributions. The RS-MC simulation means andstandard deviations for toluene concentrations in the water,gas, and solid phases at the node located at X and Y coordi-nates of 14 m and 8.5 m are presented in Figs. 2 and 3, re-spectively. Figures 2 and 3 show that the means andstandard deviations stabilized for toluene concentrations inall three phases. The results for the other BTEX componentswere similar. This illustrates that the RS-MC size of 10 000used in this study is adequate.

In LHS-MC simulations the sampling process is super-vised and is designed to cover the range of each input varia-ble, so the means and deviations of the outputs are nottracked to ensure that the LHS-MC has adequately sampledthe complete range of the input probability distributions. Toillustrate how the standard deviations stabilize in LHS-MCsimulations, the standard deviations of the LHS-MC samplesizes 100 and 500 for toluene concentrations in the water,gas, and solid phases at the node located at X and Y coordi-nates of 14 m and 8.5 m are presented in Figs. 4 and 5, re-spectively.

For LHS-MC simulations, when the number of variablesis large, Iman and Helton (1985) recommend that good re-sults can be obtained if the sample size is between (4/3) �(number of parameters) and 5 � (number of parameters).The appropriate LHS-MC sample size to be used in an un-certainty analysis also depends on the quantiles that are tobe estimated in the uncertainty analysis. For estimating the0.95 quantile, which is the quantile of interest in risk assess-ment studies, a LHS-MC sample size of at least 20 is re-quired. A sample size of 20 ensures that each variable isdivided into 20 intervals having a probability of 0.05 each.

Table 2. Input parameters, probability distributions and the distribution statistics.

Parameter Distribution Distribution Statistics SourceKsw Normal Transformed Variable Carsel and Parrish (1988)

Mean –9.2106Std deviation 1.7906

� Uniform Untransformed Variable Katyal et al. (1991)Minimum 0.15Maximum 0.35

�r (used to calculate Sm) Normal Transformed Variable Carsel and Parrish (1988)Mean –0.8659Std deviation 2.1439

Sor Uniform Untransformed Variable Katyal et al. (1991)Minimum 0.3Maximum 0.5

a Normal Transformed Variable Carsel and Parrish (1988)Mean –3.6989Std deviation 1.1589

n Normal Transformed Variable Carsel and Parrish (1988)Mean 0.2261Std deviation 0.8379

Table 3. Pearson product moment correlation matrix.

Ksw 1.0000�r 0.2040 1.0000� 0.9820 –0.0860 1.0000n 0.6320 –0.7480 0.5910 1.0000

Ksw �r � n

620 J. Environ. Eng. Sci. Vol. 7, 2008

# 2008 NRC Canada

Consequently when the estimation of very high quantiles isrequired, LHS-MC is not to be used since the more subjectivestratified sampling technique ‘‘importance sampling’’ is moreefficient (Helton and Davis 2000; Helton and Davis 2003).

Using the latter formula of 5 � (number of parameters),Khan et al. (2008) selected a LHS-MC sample size of 35.To evaluate the effect of LHS-MC sample size, Khan et al.(2008) also selected a second sample of size 100. For thesetwo sample sizes, correlated LHS samples were generatedusing Iman and Shortencarier’s (1984) LHS program. Forthe extended analysis being presented in this paper an addi-tional three replicated LHS-MC samples of size 100 each, a

LHS-MC sample of 300 and a LHS-MC sample of 500 weregenerated using Iman and Shortencarier’s (Iman and Short-encarier 1984) LHS program. These samples were simulatedusing MC-MOFAT. MC-MOFAT is a batch input version ofMOFAT compiled to run in a UNIX environment (a Com-paq Alphaserver DS10 running TRu64 Unix 5.1 was usedin this study). Each simulation took an hour on average toassemble, execute, and post process the outputs.

Results and discussion

To evaluate the ability of LHS-MC cdfs from different

Fig. 1. Finite element mesh representing simulation scenario.

Fig. 2. Variation of means of toluene concentrations in water, gas, and solid phases at node (14 m, 8.5 m) with RS-MC simulation numbers.

Khan et al. 621

# 2008 NRC Canada

Fig. 4. Variation of standard deviation of toluene concentrations at node (14 m, 8.5 m) with LHS-MC simulation number for a LHS-MCsample of 100 runs.

Fig. 3. Variation of standard deviation of toluene concentrations in water, gas, and solid phases at node (14 m, 8.5 m) with RS-MC simula-tion numbers.

622 J. Environ. Eng. Sci. Vol. 7, 2008

# 2008 NRC Canada

Fig. 5. Variation of standard deviation of toluene concentrations at node (14 m, 8.5 m) with LHS-MC simulation number for a LHS-MCsample of 500 runs.

Fig. 6. Comparison of cdfs for toluene generated using three different LHS-MC sample sizes.

Khan et al. 623

# 2008 NRC Canada

LHS-MC sample sizes to replicate corresponding RS-MCcdfs and to evaluate the stability of LHS-MC samples, out-put cdfs were plotted for each one of the three replicatedLHS-MC 100 samples, the LHS-MC 300 sample, the LHS-MC 500 sample and the 10 000 run RS-MC sample. To plotthe cdfs, the output data, consisting of BTEX concentrationsin the water, gas, and solid phases, was arranged in a de-scending order, ranked and then assigned a plotting positionusing the Cunnane plotting position (Cunnane 1978).

Evaluation of LHS-MC sample sizesFor the first LHS-MC 100 sample, the LHS-MC 300 sam-

ple, the LHS-MC 500 sample and the RS-MC sample, cdfsfor toluene concentrations in the water, gas, and solid phasesare presented in Fig. 6. The cdfs for each of the remainingBTEX component concentrations were similar.

To study the ability of each of the different LHS-MCsample sizes to produce output cdfs that replicate corre-sponding RS-MC output cdfs, the two sample Kolmogorov–Smirnov (KS) Goodness of Fit test was used to evaluate theGoodness of Fit between each of the LHS-MC output cdfsand the corresponding RS-MC output cdfs. The two sampleKS test is based on cdfs and can be used to test to seewhether two empirical distributions are different.

The two sample KS test uses the maximum difference be-tween the cdfs as the test statistic. The Z test statistic is afunction of the combined sample size and the largest abso-lute difference between the two cdfs. The two sample KStest results for toluene concentrations in the water, gas, andsolid phases are summarized in Table 4.

The p value of the KS Z-test statistic for all phases andfor all LHS-MC sample sizes is greater than 0.05. At the5% significance level the null hypothesis, that the corre-sponding LHS-MC and RS-MC cdfs are not different, can-not be rejected. Similarly the two sample KS test results forthe other BTEX components did not reject the null hypothe-sis. The two sample KS test results prove that LHS-MC cdfsfor all three sample sizes are able to replicate the corre-sponding RS-MC cdfs.

This indicates that LHS-MC can be used for the uncer-tainty analysis of MOFAT in lieu of RS-MC. All threeLHS-MC sample sizes were able to replicate the correspond-ing RS-MC cdfs, so it will be more economical to use aLHS-MC sample size of 100 for the uncertainty analysis ofMOFAT. Conducting an uncertainty analysis of MOFAT us-ing a LHS-MC sample size of 100 would require 100 h to

assemble the inputs, execute the files and post process theoutputs whereas using a RS-MC sample size of 10 000would require 10 000 h.

Evaluation of LHS-MC stabilityTo test the stability of using LHS-MC for uncertainty

analysis of MOFAT, the cdfs for each of the three replicatedLHS-MC 100 samples were compared to the correspondingRS-MC cdfs using the two sample KS test. The results fortoluene are summarized in Table 5.

The p value of the KS Z-test statistic for all three repli-cated LHS-MC sample sizes is greater than 0.05 for allphases. At the 5% significance level the null hypothesis,that each of the corresponding replicated LHS-MC and RS-MC cdfs are not different, cannot be rejected. The two sam-ple KS test results prove that all three replicated LHS-MCcdfs are able to replicate the corresponding RS-MC cdfs.The cdfs for toluene concentrations in the water, gas, andsolid phases for the three replicated LHS-MC samples andthe RS-MC sample are presented in Fig. 7.

This indicates that the LHS-MC sampling technique is arobust technique that, irrespective of the replicate LHS-MCsample used, produces cdfs that are able to replicate the cor-responding RS-MC cdfs.

ConclusionsThe following conclusions can be made based on the re-

sults presented in this study:

(1) For the uncertainty analysis of MOFAT, LHS-MC canbe used in lieu of RS-MC to study model accuracy(from a parametric uncertainty perspective). Using LHS-MC for the uncertainty analysis of MOFAT the RS-MCcdfs can be accurately replicated.

(2) While all three LHS-MC sample sizes were able to repli-cate the corresponding RS-MC cdfs, it will be more eco-nomical to use a LHS-MC sample size of 100 for theuncertainty analysis.

(3) LHS-MC is a stable and robust technique that, irrespec-tive of the replicate LHS-MC sample used, producescdfs that are able to replicate the corresponding RS-MCcdfs.

AcknowledgementsThe authors thank the staff of the computing section of

Memorial University of Newfoundland’s Faculty of Engi-

Table 4. Two sample KS test results for different LHS-MCsample sizes.

LHS-MCsample size Phase

Kolmogorov–SmirnovZ test statistic P value

100 Water 0.5423 0.9303100 Gas 0.5413 0.9313100 Solid 0.5423 0.9303300 Water 0.8465 0.4707300 Gas 0.8465 0.4707300 Solid 0.8465 0.4707500 Water 0.6583 0.7790500 Gas 0.6583 0.7790500 Solid 0.6583 0.7790

Table 5. Two sample KS test results for three LHS-MCreplicates.

LHS-MCreplicate Phase

Kolmogorov–SmirnovZ test statistic P value

1 Water 0.5423 0.93031 Gas 0.5413 0.93131 Solid 0.5423 0.93032 Water 0.4816 0.97452 Gas 0.4826 0.97402 Solid 0.4786 0.97603 Water 0.6338 0.81663 Gas 0.6318 0.81953 Solid 0.6338 0.8166

624 J. Environ. Eng. Sci. Vol. 7, 2008

# 2008 NRC Canada

neering and Applied Science and University ComputingServices for the extensive computing resources and supportthat were extended to support this research, especially theunbridled use of the Unix workstations CRUNCH andPLATO without which this work would not have been pos-sible.

ReferencesASTM. 1999. RBCA fate and transport models: compendium and

selection guidance. American Society for Testing and Materials,Pa.

Campolongo, F., Saltelli, A., Sorenson, T., and Tarantola, S. 2000.Hitchhiker’s guide to sensitivity analysis. In Sensitivity Analy-sis. Edited by A. Saltelli, K. Chan, and E.M. Scott. John Wiley& Sons, West Sussex, England. pp. 15–47.

Carsel, R.F., and Parrish, R.S. 1988. Developing joint probabilitydistributions of soil water retention characteristics. Water Re-sour. Res. 24: 755–769. doi:10.1029/WR024i005p00755.

Cunnane, C. 1978. Unbiased plotting positions- A review. J. Hy-drol. (Amst.), 37: 205–222. doi:10.1016/0022-1694(78)90017-3.

Freeze, R.A., Massmann, J., Smith, J.L., Sperling, T., and James,B.R. 1990. Hydrogeological decision analysis 1. a framework.Ground Water, 28: 738–766. doi:10.1111/j.1745-6584.1990.tb01989.x.

Helton, J.C., and Davis, F.J. 2000. Sampling based methods. InSensitivity Analysis. Edited by A. Saltelli, K. Chan, and E.M.Scott. John Wiley & Sons, West Sussex, England. pp. 101–153.

Helton, J.C., and Davis, F.J. 2003. Latin Hypercube Sampling andthe propagation of uncertainty in analyses of complex systems.Reliab. Eng. Syst. Saf. 81: 23–69. doi:10.1016/S0951-8320(03)00058-9.

Iman, R.L., and Helton, J.C. 1985. A comparison of uncertaintyand sensitivity analysis techniques for computer models. Techni-cal Reports NUREG/CR-3904, SAND84-1461. Sandia NationalLaboratories, Albuquerque, NM.

Iman, R.L., and Shortencarier, M.J. 1984. A FORTRAN 77 pro-gram and user’s guide for the generation of Latin Hypercubeand random samples for use with computer models. TechnicalReports NUREG/CR-3624, SAND83-2365. Sandia National La-boratories, Albuquerque, NM.

Kaluarachchi, J.J., and Parker, J.C. 1989. An efficient finite ele-ment method for modelling multiphase flow in porous media.Water Resour. Res. 25: 43–54. doi:10.1029/WR025i001p00043.

Kaluarachchi, J.J., and Parker, J.C. 1990. Modelling multi-compo-nent organic chemical transport in three fluid phase porousmedia. J. Contam. Hydrol. 5: 349–374. doi:10.1016/0169-7722(90)90025-C.

Katyal, A.K., Kaluarachchi, J.J., and Parker, J.C. 1991. MOFAT: Atwo dimensional finite element program for multiphase flow andmulti-component transport. Program Documentation and User’sGuide, EPA-600-2-91-020. Robert S. Kerr Environmental Re-search Laboratory, Office of Research and Development, U.S.EPA, Ada, Okla.

Khan, A.A., Husain, T., and Lye, L. 2008. Uncertainty analysis ofa tier 3 RBCA multi-phase and multi-component transportmodel- MOFAT. J. Environ. Model. Assess. Under review.

McKay, M.D., Conover, W.J., and Beckman, R.J. 1979. A compar-ison of three methods for selecting values of input variables inthe analysis of output from a computer code. Technometrics,21: 239–245. doi:10.2307/1268522.

Parker, J.C. 1989. Multiphase flow and transport in porous media.Rev. Geophys. 27: 311–328. doi:10.1029/RG027i003p00311.

Saltelli, A., Tarantola, S., Campolongo, F., and Ratto, M. 2004.

Fig. 7. Comparison of cdfs for toluene for three replicated LHS-MC samples.

Khan et al. 625

# 2008 NRC Canada

Sensitivity in practice: A guide to assessing scientific models.John Wiley & Sons Ltd, West Sussex, England.

Sleep, B.E., and Sykes, J.F. 1989. Modelling the transport of vola-tile organics in variably saturated media. Water Resour. Res. 25:81–92. doi:10.1029/WR025i001p00081.

Stein, M. 1987. Large sample properties of simulations using LatinHypercube Sampling. Technometrics, 29: 143–151. doi:10.2307/1269769.

USEPA. 1997. Guiding principles for Monte Carlo analysis. EPA630-R-97-001. US Environmental Protection Agency, Wash.

626 J. Environ. Eng. Sci. Vol. 7, 2008

# 2008 NRC Canada