methods and example case study for analysis of variability ...frey/reports/frey_zheng_2001.pdf ·...
TRANSCRIPT
Methods and Example Case Study forAnalysis of Variability and Uncertainty in
Emissions Estimation (AUVEE)
Prepared by:
H. Christopher Frey, Ph.D.Junyu Zheng
Computational Laboratory for Energy, Air and RiskDepartment of Civil EngineeringNorth Carolina State University
Raleigh, NC
Prepared for:
Office of Air Quality Planning and StandardsU.S. Environmental Protection Agency
Research Triangle Park, NC
February 2001
Disclaimer
This document was furnished to the U.S. Environmental Protection Agency by
North Carolina State University. This document is final and has been reviewed and
approved for publication. The opinions, findings, and conclusions expressed represent
those of the authors and not necessarily the EPA. Any mention of company or product
names does not constitute an endorsement by the EPA.
i
Table of Contents
1.0 INTRODUCTION .................................................................................................. 1
1.1 Project Objectives ....................................................................................... 2
1.2 Variability and Uncertainty......................................................................... 3
1.3 Probabilistic Methods ................................................................................. 3
1.4 Motivations for the Selected Case Study: Utility NOx Emissions............. 4
1.5 Overview of this Report.............................................................................. 5
2.0 METHODOLOGY ................................................................................................. 7
2.1 Visualizing Data Using Empirical Distributions and Scatter Plots ................ 7
2.2 Selecting a Parametric Distribution for a Model Input ............................... 9
2.2.1 Normal Distribution ...................................................................... 10
2.2.2 Lognormal Distribution ................................................................ 11
2.2.3 Gamma Distribution...................................................................... 11
2.2.4 Weibull Distribution ..................................................................... 12
2.2.5 Beta distribution............................................................................ 12
2.3 Parameter Estimation of Parameter Distributions..................................... 13
2.3.1 Normal Distribution ...................................................................... 16
2.3.2 Lognormal Distribution ................................................................ 17
2.3.3 Weibull Distribution ..................................................................... 17
2.3.4 Gamma Distribution...................................................................... 18
2.3.5 Beta Distribution........................................................................... 18
2.4 Evaluation of Goodness of Fit of a Probability Distribution Model......... 19
2.5 Numerical Methods for Generating Samples from Probability
Distributions.............................................................................................. 19
2.5.1 Normal Distribution ...................................................................... 19
2.5.2 Lognormal Distribution ................................................................ 21
2.5.3 Weibull Distribution ..................................................................... 21
2.5.4 Gamma Distribution...................................................................... 21
2.5.5 Beta Distribution........................................................................... 22
ii
2.6 Bootstrap Simulation and Application to Characterization of Variability
and Uncertainty Using Parametric Distributions ...................................... 23
2.7 Two-Dimensional Simulation of Uncertain Frequency Distributions ...... 26
2.8 Propagating Distributions Through a Model ............................................ 28
2.9 Analyzing Probabilistic Emission Inventory Results ............................... 28
2.10 Summary................................................................................................... 28
3.0 DEVELOPMENT OF INPUT DATA FOR UTILITY NOx EMISSIONS CASE
STUDIES .............................................................................................................. 29
3.1 Origin and Description of Utility NOx Emissions Data............................ 29
3.2 Development of Data Files for Selected Averaging Times ...................... 30
3.3 Data Screening and Quality Assurance..................................................... 32
3.4 The Structure of the Final Database.......................................................... 34
3.5 Calculation of Emission Factors and Activity Factors ............................. 34
3.6 Evaluation of Possible Statistical Dependencies in the Database............. 37
3.6.1 Comparison of 1997 and 1998 Data ............................................. 38
3.6.2 Evaluation of Possible Dependencies Between Activity and
Emission Factors........................................................................... 40
3.7 Statistical Summary of the Database ........................................................ 42
4.0 AUVEE SYSTEM DEVELOPMENT AND IMPLEMENTATION ................... 47
4.1 General Structure of the AUVEE Prototype Software ............................. 47
4.2 Databases in the AUVEE Prototype Software.......................................... 47
4.3 Modules in the AUVEE Prototype Software ............................................ 48
4.3.1 Fitting Distribution Model ............................................................ 49
4.3.2 Characterizing Uncertainty Module.............................................. 49
4.3.3 Emission Inventory Module.......................................................... 49
4.3.4 User Data Input Module................................................................ 50
4.3.5 Graphical User Interface (GUI) .................................................... 50
4.4 Software Development Tools ................................................................... 50
5.0 DEVELOPMENT OF A PROBABILISTIC EMISSION INVENTORY............ 53
5.1 General Approach ..................................................................................... 54
5.2 Emission Inventory model ....................................................................... 57
iii
5.3 Development of Probability Distributions for the Emission Inventory
Model Inputs ............................................................................................. 58
5.4 A Probabilistic Approach for Calculating Uncertainty in the Emission
Inventory of Coal-Fired Power Plants ...................................................... 59
5.5 Identifying Key Sources of Uncertainty ................................................... 63
6.0 EXAMPLE CASE STUDY .................................................................................. 65
6.1 Fitting Distributions to Data to Represent Inter-Unit Variability............. 66
6.2 Quantifying Uncertainty in Statistics of the Fitted Distributions ............. 68
6.3 Evaluating Goodness-of-Fit Using Bootstrap Results .............................. 71
6.4 Quantifying Uncertainty in the Inputs to an Emission Inventory............. 75
6.5 Propagating Uncertainty in Emission Inventory Inputs to Predict
Uncertainty in Emission Inventory Outputs ............................................. 79
6.5.1 Uncertainty Results for the Example Six Month Emission
Inventory....................................................................................... 79
6.5.2 Uncertainty Results for the Example Twelve Month Emission
Inventory....................................................................................... 82
6.6 Identifying Key Sources of Uncertainty in the Inventory......................... 84
7.0 CONCLUSIONS................................................................................................... 87
8.0 ACKNOWLEDGMENTS .................................................................................... 91
9.0 REFERENCES ..................................................................................................... 93
v
List of Figures
Figure 2-1. Plot Illustrating the 95 Percent Probability Range on a Cumulative
Distribution Function .......................................................................................8
Figure 2-2. Simplified Flow Diagram for Bootstrap Simulation and Two-
Dimensional Simulation of Uncertainty and Variability................................27
Figure 3-1. Scatter plot of 6-month NOx Emission Rate of 1997 and 1998 ....................39
Figure 3-2. Scatter plot of 12-month Capacity Factor of 1997 and 1998 .........................39
Figure 3-3. Scatter Plot for 6-month Average Heat Rate versus 6-month Average
Capacity Factor for Tangential-Fired Boilers Using Low NOx Burners
and Overfire Air Option 1. (n=41) .................................................................41
Figure 3-4. Scatter Plot for 6-month Average NOx Emission Rate versus 6-month
Average Capacity Factor for Tangential-Fired Boilers Using Low NOx
Burners and Overfire Air Option 1. (n=41)....................................................41
Figure 3-5. Scatter Plot for 6-month Average NOx Emission Rate versus 6-month
Average Heat Rate for Tangential-Fired Boilers Using Low NOx Burners
and Overfire Air Option 1. (n=41) .................................................................42
Figure 4-1. Conceptual Design of the Analysis of Uncertainty and Variability in
Emissions Estimation (AUVEE) Prototype Software System .......................48
Figure 5-1. Flow Diagram Illustrating the Propagation of Variability in Emission
Inventory Inputs to Obtain a Point Estimate of Total Emissions ...................54
Figure 5-2. Flow Diagram Illustrating the Propagation of Variability and Uncertainty
in Emission Inventory Inputs to Quantify the Uncertainty in the Estimate
of Total Emissions..........................................................................................56
Figure 5-3. Flowchart for Calculating Uncertainty in Emission Inventory Using
Bootstrap Simulation......................................................................................61
Figure 6-1. Comparison of Fitted Lognormal Distribution and Six-Month Average
NOx Emission Factor Data for Tangential-Fired Boilers with NOx
Control............................................................................................................67
Figure 6-2. Comparison of Fitted Beta Distribution and Six-Month Average Capacity
Factor Data for Tangential-Fired Boilers with NOx Control..........................67
vi
Figure 6-3. Comparison of Fitted Lognormal Distribution and Six-Month Average
Heat Rate Data for Tangential-Fired Boilers with NOx Control. ...................67
Figure 6-4. Probability Bands Representing Uncertainty in the Parametric
Distribution Fitted to NOx Emission Factor Data for T/LNC1 (n=41)...........72
Figure 6-5. Probability Bands Representing Uncertainty in the Parametric
Distribution Fitted to NOx Capacity Factor Data for T/LNC1 (n=41) ...........72
Figure 6-6. Probability Bands Representing Uncertainty in the Parametric
Distribution Fitted to Heat Rate Data for T/LNC1 (n=41).............................72
Figure 6-7. Probability Bands Based Upon Number of Units in the Emission
Inventory (n=11) for the Example of the Emission Factor of the T/LNC1
Technology Group..........................................................................................76
Figure 6-8. Probability Bands Based Upon Number of Units in the Emission
Inventory (n=11) for the Example of Capacity Factor of the T/LNC1
Technology Group..........................................................................................76
Figure 6-9. Probability Bands Based Upon Number of Units in the Emission
Inventory (n=11) for the Example of Heat Rate of the T/LNC1
Technology Group..........................................................................................76
Figure 6-10. Uncertainty in a Six-Month NOx Emission Inventory for an Individual
Technology Group (T/LNC1) Comprised of 11 Units. ..................................81
Figure 6-11. Uncertainty in a Six-Month NOx Emission Inventory Inclusive of Four
Technology Groups. .......................................................................................81
Figure 6-12. Uncertainty in a 12-Month NOx Emission Inventory for an Individual
Technology Group (T/LNC1) Comprised of 11 Units. ..................................83
Figure 6-13. Uncertainty in a Twelve-Month NOx Emission Inventory Inclusive of
Four Technology Groups................................................................................83
Figure 6-14. Relative Importance of Uncertainty in Emissions from Individual
Technology Groups with Respect to Overall Uncertainty in the Total
Emission Inventory: Results from the Six-Month Emission Inventory
Case Study. .....................................................................................................85
Figure 6-15. Relative Importance of Uncertainty in Emissions from Individual
Technology Groups with Respect to Overall Uncertainty in the Total
vii
Emission Inventory: Results from the Six-Month Emission Inventory
Case Study. .....................................................................................................85
ix
List of Tables
Table 2-1. Expressions for Log-likelihood Functions for Data Belonging to Various
Probability Distribution Models. ....................................................................16
Table 3-1. Summary of Data for Use in Case Studies. ...................................................37
Table 3-2. Statistical Summary of the 1998 6-month Database for Five Selected
Technology Groups ........................................................................................43
Table 3-3. Statistical Summary of the 1998 12-month Database for Five Selected
Technology Groups ........................................................................................44
Table 5-1. Summary of Selected Best Fit Parametric Distribution and Parameters for
Emission and Activity Factors for Five Coal-Fired Power Plant
Technology Groups Based Upon Six-Month Average Data. .........................60
Table 5-2. Summary of Selected Best Fit Parametric Distribution and Parameters for
Emission and Activity Factors for Five Coal-Fired Power Plant
Technology Groups Based Upon Twelve-Month Average Data. ..................60
Table 6-1. Summary of Uncertainty in 6-month Emission Inventory Mean Emission
and Activity Factors Based Upon National Data ...........................................70
Table 6-2. Summary of Uncertainty in 12-month Emission Inventory Mean
Emission and Activity Factors Based Upon National Data............................70
Table 6-3. Summary of the Goodness-of-Fit of Parametric Distributions Fitted to
Emission and Activity Factor Data for a Six-Month Emission Inventory
Based Upon Evaluation of the Proportion of Data Enclosed by the 50
Percent and 95 Percent Probability Bands of the Fitted Cumulative
Distribution Function. ....................................................................................74
Table 6-4. Summary of the Goodness-of-Fit of Parametric Distributions Fitted to
Emission and Activity Factor Data for a 12-Month Emission Inventory
Based Upon Evaluation of the Proportion of Data Enclosed by the 50
Percent and 95 Percent Probability Bands of the Fitted Cumulative
Distribution Function. ....................................................................................74
Table 6-5. Summary of Uncertainty in 6-month Emission Inventory Mean Emission
and Activity Factors Based Upon the Number of Units in the Example
Case Study......................................................................................................78
x
Table 6-6. Summary of Uncertainty in 12-month Emission Inventory Mean
Emission and Activity Factors Based Upon the Number of Units in the
Example Case Study.......................................................................................78
Table 6-7. Summary of Uncertainty Results for the Six Month Emission Inventory
Case Study......................................................................................................81
Table 6-8. Summary of Uncertainty Results for the Twelve Month Emission
Inventory Case Study .....................................................................................83
1
1.0 INTRODUCTION
Emission Inventories (EIs) are a vital component of environmental decision
making. For example, emission inventories are used at federal, state, and local
governments and private corporations for: (a) characterization of temporal emission
trends; (b) emissions budgeting for regulatory and compliance purposes; and (c)
prediction of ambient pollutant concentrations using air quality models. If random errors
and biases in the EIs are not quantified, they can lead to erroneous conclusions regarding
trends in emissions, source apportionment, compliance, and the relationship between
emissions and ambient air quality.
There is growing recognition of the importance of quantitative uncertainty
analysis in environmental modeling and assessment. The National Research Council
(NRC) has recently recommended that quantifiable uncertainties be addressed in
estimating mobile source emission factors, and in the past has addressed the need for
understanding of uncertainties in emission inventories used in air quality modeling and in
risk assessment (NRC, 1991; 1994; 2000). The U.S. Environmental Protection Agency
(EPA) has developed guidelines for Monte Carlo analysis of uncertainty, and has also
sponsored several workshops regarding probabililistic analysis (EPA, 1996; 1997; 1999).
As part of previous and ongoing work, research is underway to develop and
demonstrate improved methods for quantifying uncertainty in emission inventories. In
the area of mobile source emissions, for example, Kini and Frey (1997) developed
quantitative estimates of uncertainty associated with the Mobile5b emission factor model
estimates of light duty gasoline vehicle base emissions and speed-corrected emissions.
Pollack et al. (1999) performed a similar study on California's EMFAC7G highway
vehicle emission factor model. Frey et al. (1999) revisited the earlier analysis of
Mobile5b emission factor estimates to include uncertainties associated with temperature
corrections. Bammi and Frey (2001) estimated uncertainty in the emission factors for a
non-road source category of lawn and garden equipment. Frey and Li (2001) estimated
uncertainty in emission factors for stationary natural gas-fueled internal combustion
engines.
In the area of power plant emissions, Rubin et al. (1993) and Frey and colleagues
have developed uncertainty estimates for emissions of hazardous air pollutants and for
2
NOx emitted by coal-fired power plants (Frey and Rhodes, 1996; Frey and Bharvirkar,
2001; Frey et al., 1999; Rhodes and Frey, 1997). In addition, as part of recent work,
methods for quantification of variability and uncertainty in emissions estimation have
been developed, evaluated, and demonstrated, including the use of Monte Carlo
simulation and bootstrap simulation (Frey and Rhodes, 1998; Frey and Burmaster, 1999;
Cullen and Frey, 1999).
1.1 Project Objectives
Emission inventory work should include characterization and evaluation of the
quality of data used to develop the inventory. In this project, we demonstrate a
quantitative approach to the characterization of both variability and uncertainty as an
important foundation for conveying the quality of estimates to analysts and decision
makers.
The objectives of this project are to:
(1) Demonstrate a general probabilistic approach for quantification of variability and
uncertainty in emission factors and emission inventories;
(2) Demonstrate the insights obtained from the general probabilistic approach
regarding the ranges of variability and uncertainty in both emissions factors and
emission inventories;
(3) Demonstrate how probabilistic analysis can be used to identify key sources of
variability and uncertainty in an inventory for purposes of targeting additional
work to improve the quality of the inventory;
(4) Develop a prototype software tool for calculation of variability and uncertainty in
statewide inventories for a selected emission source and pollutant; and
(5) Facilitate the transfer of the general approach and prototype software tool to
federal, state or local governments or other recipients via development of
appropriate technical and software documentation of the approach and the
prototype software.
To satisfy these five objectives, a prototype software tool was developed. The prototype
software is "Analysis of Uncertainty and Variability in Emissions Estimation," or
AUVEE. The purpose of this software is to demonstrate a general methodology for
characterization of both variability and uncertainty in emission inventories. A specific
3
case study example was selected to illustrate methods for probabilistic emission
inventories. The selected case study, power plant NOx emissions, was chosen because
power plant emissions represent a large contribution to national NOx emissions. NOx
emissions are a significant concern because of their contribution to local and regional
ozone formation. Thus, this example is expected to be of widespread interest.
This report provides technical documentation of the theoretical basis for the
probabilistic emission inventory calculations, of the database used for the specific case
study, of the general structure of the AUVEE system, and an example case study
illustrating the use of the probabilistic capability. The accompanying user's manual (Frey
and Zheng, 2000) documents the methodology of the software tool.
1.2 Variability and Uncertainty
The AUVEE software takes into account both variability and uncertainty in the
process of developing a probabilistic emission inventory. Variability is the heterogeneity
of values with respect to time, space, or a population. Uncertainty arises due to lack of
knowledge regarding the true value of a quantity. Variability in emissions arises from
factors such as: (a) variation in feedstock (e.g., fuel) compositions; (b) inter-plant
variability in design, operation, and maintenance; and (c) intra-plant variability in
operation and maintenance. Uncertainty typically arises due to statistical sampling error,
measurement errors, and systematic errors. In most cases, emissions estimates are both
variable and uncertain. Therefore, we employ a methodology for simultaneous
characterization of both variability and uncertainty based upon previous work in
emissions estimation, exposure assessment, and risk assessment. The method features the
use of Monte Carlo and bootstrap simulation.
1.3 Probabilistic Methods
The specifics of the methodology used by the AUVEE software are documented
in this report. A previous report by Frey, Bharvirkar, and Zheng (1999) illustrates the
application of similar methods to three case studies. In addition, there are other technical
reports and papers which illustrate the use of probabilistic methods. Examples of these
include Cullen and Frey (1999), Efron and Tibshirani (1993), EPA (1996), EPA (1997),
EPA (1999), Frey (1998a&b), and Frey and Rhodes (1998). Probabilistic methods have
4
previously been demonstrated in the context of air toxics emissions estimation, highway
vehicle emission factors, and utility emissions (e.g., Frey, 1997; Kini and Frey, 1997;
Frey, 1998b; Frey and Rhodes, 1996; Frey et al., 1998; Frey et al., 1999a; Frey et al.,
1999b).
1.4 Motivations for the Selected Case Study: Utility NOx Emissions
The perspective of the uncertainty analysis in the example case study is with
respect to trying to estimate future emissions. Clearly, with the prevalence of continuous
emission monitoring (CEM) equipment for measuring hourly NOx emissions from a large
number of power plants in the U.S., it is possible in many cases to characterize recent
emissions of these plants with a comparative high degree of accuracy (e.g., perhaps
precise to within approximately plus or minus 3 percent -- see Frey and Tran, 1999).
However, when making estimates of emissions any time into the future, it is more
difficult to make a precise prediction. This is because there is underlying variability in
the emissions of a single unit from one time period to another, even if the unit load is
similar. Therefore, the purpose of the case study in the AUVEE prototype software tool
is to assist in developing probabilistic estimates of future emission inventories based
upon statistical analysis of representative CEMs data.
The prototype software tool was developed to demonstrate a methodology. It was
not intended to be comprehensive in terms of scope of coverage of all possible power
plant technologies. To illustrate the methodology, five "technology groups" have been
selected for characterization. A "technology group" is a combination of power plant unit
furnace technology and of NOx control technology (e.g., tangential-fired furnace with
combustion-based NOx control). The methods used to characterize variability and
uncertainty in the emissions associated with these five technology groups can be
extended later to include other technology groups. Furthermore, the methods can be
extended to other source categories and other pollutants.
In developing emission inventories, it is important to keep in mind the averaging
time associated with the inventory. For example, in the prototype version of the AUVEE
software tool, we include two different averaging times for power plant NOx emissions.
One is a 6-month averaging time, which is inclusive of the 2nd and 3rd quarters of the
year. This 6-month period, therefore, includes the summer months which constitute the
5
peak of the "ozone season." The other averaging time is a 12-month average, which
would be useful for developing estimates of uncertainty in annual emission inventories.
The prototype AUVEE software tool does not currently have a provision for calculating
emission inventories for any other averaging time. Because the range of uncertainty in
emission inventories is a function of the averaging time used in the inventory, the results
of the uncertainty analyses from the prototype AUVEE software should not be applied to
other averaging times without appropriate adjustments.
Although the methodology used in the AUVEE prototype software tool is one that
can be widely applied, the results generated by the program are specific to the technology
groups, averaging times, user input assumptions (e.g., number of units of each technology
group and their sizes), data sets, and probabilistic assumptions (e.g., selection of
parametric distributions) used in applying the software. Therefore, when reporting
results from the use of the AUVEE software tool, we recommend that the user carefully
document all of the assumptions used in a given case study so that another user could
reproduce the same results.
1.5 Overview of this Report
The theoretical basis for the methodology employed in this work is documented in
Chapter 2. In Chapter 3, the data used for the case study are described in detail, including
procedures by which available databases were used to create databases specific for the
case studies and the AUVEE prototype software. The general structure of the AUVEE
prototype software is described in Chapter 4. An illustrative case study is given in
Chapter 5. The case study demonstrates key steps in a probabilistic emission inventory,
and also illustrates the technical capabilities of the AUVEE prototype software.
Conclusions and recommendations are offered in Chapter 6. Readers interested in more
detail regarding how to use the AUVEE software are referred to the accompanying User's
Manual (Frey and Zheng, 2000).
7
2.0 METHODOLOGY
In this chapter, the methodology used in the prototype software AUVEE for
conducting probabilistic analysis is discussed. Six areas of interest in this project are: (1)
the visualization of datasets using empirical distributions; (2) the selection of model input
distributions; (3) estimation of parameters of a distribution; (4) techniques for sampling
values from a distribution; (5) the use of bootstrap techniques to quantify variability and
uncertainty in quantities such as activity factors, emission factors using parametric
distributions; and (6) methods for propagating distributions through an emission
inventory and for analyzing results.
2.1 Visualizing Data Using Empirical Distributions and Scatter Plots
Some of the key purposes of visualizing data sets include: (1) evaluation of the
central tendency and dispersion of the data; (2) visual inspection of the shape of the
empirical distribution of the data as a potential aid in selecting parametric probability
distribution models to fit to the data; (3) identification of possible anomalies in the data
set (e.g., outliers); and (4) identification of possible dependencies between variables.
Specific techniques for evaluating and visualizing data include calculation of summary
statistics, development of empirical cumulative distribution functions, and generation of
scatter plots for the evaluation of dependencies between pairs of activity and emission
factors. An assumption is that all the quantities considered in this study are treated as
continuous random variables.
Three key characteristics of a cumulative distribution function are its central
tendency, dispersion, and shape. There are several measures of central tendency, which
include mean, median, and mode. The dispersion, or the spread, of a distribution is
measured by the standard deviation in the variance of the distribution. The relative
standard deviation (RSD), also known as the coefficient of variation (CV), is the standard
deviation divided by the mean. The CV provides a normalized indication of the
dispersion of data values, with a large CV indicating relatively large variability in the
data set. The shape of the distribution is reflected by measurable quantities such as
skewness and kurtosis. These statistics can be used to aid in the selection of a parametric
probability distribution model to fit to the data (Cullen and Frey, 1999).
8
A Cumulative Distribution Function (CDF) is a relationship between “cumulative
probability” and values of the random variable. Cumulative probability is the probability
that the random variable has values less than or equal to a given numerical value.
Cumulative distribution functions provide a relationship between fractiles and quantiles.
A fractile is the fraction of values that are less than or equal to a given value of a random
variable. Fractiles expressed on a percentage basis are referred to as percentiles. A
quantile is the value of a random variable associated with a given fractile. For example,
the range of data values enclosed by the 0.025 and 0.975 fractiles (2.5 and 97.5
percentiles) is often of particular interest, since it provides an indication of the dispersion
of a distribution as reflected by the 95 percent probability range of values. An example
of a CDF is illustrated in Figure 2-1.
Empirical estimation of a fractile from data requires rank ordering of the data.
There are several possible methods for estimating the percentile of an empirically
observed data point. These methods are referred to as “plotting positions.” The plotting
position is an estimate of the cumulative probability of a data point. As described by
Cullen and Frey (1999), Harter (1984) provides an overview of the various types of
plotting positions. A commonly used plotting position, proposed by Hazen (1914), is
used in this study.
95 Percent ProbabilityRange
200 300 400 500 600 700 800
NOx Emission Factor (Gram/ GJ Fuel Input)
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
Figure 2-1. Plot Illustrating the 95 Percent Probability Range on a CumulativeDistribution Function.
9
n
ixXxF iiX
5.0)Pr()(
−=<= , for i = 1, 2, …, n and x1 < x2 < … < xn
where,
i = Rank of the data point when the data set is arranged in an ascending order
n = number of data points
x1 < x2 < … < xn are data points in the rank-ordered data set
Pr(X<xi) = Cumulative probability of obtaining a data point whose value is less
than xi
2.2 Selecting a Parametric Distribution for a Model Input
Selecting appropriate parametric distributions for uncertain and/or variable inputs
is crucial to the integrity of probabilistic analysis results. A distribution is selected based
on how well it represents a sample data set from the population, the judgments of experts,
the characteristics of the underlying population, or some combination of these factors.
The best representative distribution may be empirical, or any of a number of parametric
distributions (e.g. Normal, Lognormal, Uniform, Triangular, Exponential, Beta, Gamma,
Weibull, etc.).
In choosing a distribution function to represent either variability or uncertainty, it
is often useful to theorize about processes that generate both the data and particular types
of distributions. A priori knowledge of the mechanisms that impact a quantity may lead
to the selection of a distribution to represent that quantity. For example, an underlying
mechanism based on the central limit theorem (CLT) may lead to the selection of the
Normal or Lognormal distribution. Other factors to consider may be whether values must
be non-negative, which rules out infinite two-tailed distributions such as the Normal, or
whether or not the distribution is symmetric. Discussions of distribution selection criteria
can be found in Hahn and Shapiro (1967), Morgan and Henrion (1990), Hattis and
Burmaster (1994), and Seiler and Alvarez (1996), among others. Five commonly used
parametric distributions (Normal, Lognormal, Weibull, Gamma, and Beta distributions)
are used in this project to represent variability. Uncertainty due to measurement error is
10
commonly represented as a Normal distribution. A distribution of uncertainty due to
sampling error depends on the uncertain parameter. For example, for a normally
distributed data set, a sampling distribution for the mean can be represented by a
Student’s t-distribution (Johnson and Kotz, 1970b), and for the variance by a chi-square
distribution (Steel and Torrie, 1980). More generally, sampling distributions can be
represented by empirical distributions (Law and Kelton, 1991). In the following sections,
definitions and the basis for selection are presented for the five parametric distributions
for variability.
2.2.1 Normal Distribution
The Normal Distribution is defined by the probability density function (PDF),
f (x) =1
2πσ2e
− x −µ( )2
2σ2
(2-1)
for all real numbers x, where µ is the arithmetic mean, and σ2 is the arithmetic variance.
The Normal distribution is widely used in part because it has been well studied
and frequently used in classical statistics (Morgan and Henrion, 1990). A theoretical
criterion for selecting the Normal distribution is based on the central limit theorem.
According to the central limit theorem, the distribution of standardized sums of random
variables tends to a unit normal distribution as the number of variables in the sum
increases (Johnson and Kotz, 1970a). Therefore, the Normal distribution can be used to
represent a quanitity for which the underlying mechanism can be described by the CLT,
such as the resultant of a large number of additive independent errors. An example of a
process is generated by the sum of many random variations is pollutant dispersion as
described by the Gaussian plume model (Seinfeld, 1986). The Normal distribution is not
appropriate for representing non-negative quanitities because it has an infinite negative
tail. However, it can be safely used for non-negative quantities, such as weight of length,
so long as the coefficient of variation is less than about 0.2 (Morgan and Henrion, 1990).
If the mean is more than five standard deviations from zero, then the probability of
selecting a random variable less than zero is on the order of 10-6.
11
2.2.2 Lognormal Distribution
The Lognormal distribution is defined by the PDF
f (x) =1
x 2πσ2e
− ln x− µ( )2
2σ 2
(2-2)
for x > 0.
The CLT can also be used as the basis for selecting a Lognormal distribution to
represent a quantity. A result of the CLT is that if a large number of random variables
are multiplied together (their logarithms are added), then the result tends toward a
Lognormal distribution (their logarithms are normally distributed). The Lognormal
distribution has often been found to be a good representation of non-negative, positively
skewed physical quantities, such as pollutant concentrations (Morgan and Henrion,
1990). An example of a quantity that is non-negative, and results from the product of
many random variations is the dilution of pollutant concentrations (Hattis and Burmaster,
1994).
2.2.3 Gamma Distribution
The Gamma distribution, G(α,β), is defined by the PDF
f (x) =β−α xα −1e−x β
Γ α( )(2-3)
for x > 0, where α is the shape parameter, β is the scale parameter, and Γ(·) is the gamma
function.
The Gamma distribution can be justified on theoretical grounds as a time-to-
failure model (Law and Kelton, 1991). However, it has also been found empirically to
represent a wide variety of phenomenon, such as distributions for non-negative
quantities. The Gamma distribution encompasses a number of special cases. For
example, the Gamma (1, β) distribution is an Exponential distribution with mean of β, and
Gamma (k/2, 2) distribution is a chi-square distribution with k degrees of freedom (Hahn
and Shapiro, 1967). The chi-square distribution can be used to represent a sampling
distribution for the variance of a normally distributed quantity.
12
2.2.4 Weibull Distribution
The Weibull distribution, W(α,β), is defined by the PDF
f (x) = αβ −α xα −1 exp −x
β
α
(2-4)
for x > 0, where α > 0 is the shape parameter, and β > 0 is the scale parameter.
The Weibull distribution, like the Gamma distribution, has often been found, on
empirical grounds, to be a good representation of data sets. While the theoretical
justifications for the Weibull distribution are based upon time-to-failure and extreme
value theory (Hahn and Shapiro, 1967), this distribution has been used to represent non-
negative quantities such as ambient air pollutant concentrations (Seinfeld, 1986). One
special case of the Weibull distribution is that for α = 1, the Weibull distribution is the
same as an exponential distribution with a mean of β.
2.2.5 Beta distribution
The Beta distribution is characterized by finite upper and lower bounds and two
shape parameters. A Beta distribution bounded by zero and one is a “two-parameter
Beta,” while a Beta distribution with other values for the minimum and maximum is
considered to be a “four-parameter Beta.”
The two-parameter Beta distribution, Beta(α,β), bound by the interval [0,1] is
defined by the PDF
f (x) =x1 α 1− x( )β−1
ββββ(α ,β)(2-5)
for 0 < x < 1, where α and β are shape parameters, and ββββ(α,β) is the beta function.
A theoretical basis for the Beta distribution is that it arises from the ratio of two
Gamma distributions. The two parameter Beta distribution, bound by the interval [0,1],
is useful for representing variability or uncertainty in a fraction that cannot exceed one.
For example, a Beta distribution is to represent partitioning factors that range from zero
to one. The partitioning factors are based upon the ratio of the distribution for output
mass flow to the distribution for input mass flow. Because the Beta distribution can take
on a wide variety of shapes, such as negatively skewed, symmetric, and positively
13
skewed, it has found a wide variety of applications to represent empirical data or the
judgments of experts.
2.3 Parameter Estimation of Parameter Distributions
A probability distribution model is a description of the probabilities of all possible
values in a sample space. A probability model is typically represented as a probability
density function (PDF) or a CDF for a continuous random variable. The PDF for a
continuous random variable indicates the relative likelihood of values. The CDF is
obtained by integrating the PDF (Cullen and Frey, 1999).
Probability distribution models may be empirical, parametric, or combinations of
both. A parametric probability distribution model is a model described by parameters.
The power of using parametric probability distribution models is that data sets, which
may contain large numbers of values can be described in a compact manner based on a
particular type of parametric distribution function and the values of its parameters. For
example, a normal distribution is fully specified if its mean and variance are known.
Another potential advantage of parametric probability distributions compared to
empirical distributions is that it is possible to make predictions in the tails of the
distribution beyond the range of observed data. In contrast, using conventional empirical
distributions, the minimum and maximum values of the distribution are limited to their
minimum and maximum values, respectively, of the data set. These values typically
change as more data are collected.
In order to estimate values of the parameters of a parametric distribution,
statistical estimation methods must be used. Using these estimation methods, inferences
are made from an available data set regarding a best estimate of the parameter values.
Usually, there are alternative methods available to estimate parameter values from
analysis of data sets. Thus, it is necessary to choose a parameter estimation method.
Small (1990) has discussed the following six characteristics of estimators for the
parameters of probability distribution models. These characteristics are useful when
comparing and selecting an estimation method:
1. Consistency: A consistent estimator converges to the “true” value of theparameter as the number of samples increases.
14
2. Lack of Bias: An unbiased estimator yields an average value of the parameterestimate that is equal to that of the population value.
3. Efficiency: An efficient estimator has minimum variance in the samplingdistribution of the estimate. A sampling distribution is a probabilitydistribution for a statistic (e.g., mean, standard deviation, distributionparameters).
4. Sufficiency: An estimator that makes maximum use of information containedin a data set is said to be sufficient.
5. Robustness: A robust estimator is one that works well even if there aredepartures from the underlying distribution. In other words, it will yieldreasonable values of the parameters even if there are some anomalies in thedata set.
6. Practicality: A practical estimator is one that satisfies the needs for thepreceding five characteristics while remaining computationally efficient.
Based upon visual inspection of an empirical distribution function as described in
Section 2.1, and consideration of processes that generated the data as described in Section
2.2, the analyst will make a judgment regarding selection of one or more candidate
parametric distributions to fit to the data set. Once a particular parametric distribution has
been selected, a key step is to estimate the parameters of the distribution. The method of
Maximum Likelihood Estimation (MLE) and the Method of Matching Moments
(MoMM) are among the most typical techniques used for estimating the parameters.
MoMM is based upon matching the moments or central moments of a parametric
distribution (e.g., mean, variance) to the moments or central moments of the data set.
MoMM estimators are often easy to calculate. For example, there are convenient
solutions for MoMM parameter estimates for Normal, Lognormal, Gamma, and Beta
distributions (Hahn and Shapiro, 1967).
The method of maximum likelihood estimation involves the selection of
parameter values which are most likely to yield the observed data set (Cohen and
Whitten, 1993). A likelihood function for independent samples is defined as the product
of the PDF evaluated at each of the sample values. For a continuous random variable, for
which independent samples have been obtained, the likelihood function is:
),...,,|(),...,,( 211
21 k
n
iik xfL θθθθθθ ∏
=
= (2-6)
15
where,
θ1, θ2, …, θk = Parameters of the parametric probability distribution model
k = number of parameters for the parametric probability distribution model
xi = Values of the random variable, for, i = 1, 2, …, n
n = number of data points in the data set
f = Probability density function
Usually, k is equal to two (corresponding to two-parameter distribution) or three
(corresponding to three-parameter distribution). The values of the parameters that
maximize the likelihood function are sometimes determined analytically using standard
techniques of calculus. In many cases, it is more convenient to work with a log
transformation of the likelihood function, referred to as a log-likelihood function. That is,
the first partial derivatives of the likelihood function taken with respect to the parameters
are set equal to zero. When an analytical solution is not readily available, the maximum
likelihood parameter estimates can be found using numerical techniques such as the
Newton-Raphson method or non-linear programming optimization. In this project, non-
linear optimization was used to solve the maximum likelihood function.
The log-likelihood functions for the estimating the parameters of Normal,
Lognormal, Gamma, Weibull, and Beta distributions are shown in Table 2-1. The number
of data points is n and each data point is represented as xi, where, i takes the values 1
through n.
For small sample sizes, the maximum likelihood estimates do not always yield
minimum variance or unbiased estimates (Holland and Fitz-Simmons, 1982). However,
for larger sample sizes, the maximum likelihood method tends to better satisfy the first
five criteria for statistical estimation than other methods. Compared to MLE, MoMM
estimators tend to be more robust but less efficient. MLE can be extended to estimate
parameters for distributions fitted to censored data. In the present study, the method of
maximum likelihood estimation and a modified moment estimation method have been
used to estimate the parameters for the probability distribution models. In this project,
16
we used MoMM method to obtain initial estimate of parametric distribution, then using
those initial values to conduct non-linear optimization to get MLE parameter estimates.
. The techniques for estimating parameters for the five parametric distributions
discussed in this project using the method of matching moments are provided in Section
2.3.1 through Section 2.3.5.
Table 2-1. Expressions for Log-likelihood Functions for Data Belonging to VariousProbability Distribution Models.
Name of Distribution a Log-likelihood Function
Normal
(µ = mean, σ = standard deviation)∑
=
−
−−−=n
i
ixnnJ
12
2
2
)()2ln(
2ln),(
σµ
πσσµ
Lognormal
(µ = mean, σ = standard deviation,
of log-transformed data)
∑=
−−−−=
n
i
ixnnJ
12
2
2
))(ln()2ln(
2ln),(
σµπσσµ
Gamma
(α = shape, β = scale, parameters)
[ ]{ } ∑=
−−+Γ+−=n
i
ii
xxnJ
1
)ln()1()(ln)ln(),(β
ααβαβα
Weibull
(α = shape, β = scale, parameters)∑
=
−
−+
−=
n
i
ii xxnJ
1
ln)1(ln),(α
ββα
βαβα
Beta
(α = shape, β = scale, parameters)
{ }∑=
−−−−+
+ΓΓΓ−=
n
iii xxnJ
1
)1ln()1()ln()1()(
()(ln),( βα
βαβαβα
a Note: Parameter values are different for each type of distribution even though the same symbol may beused to represent parameters of different distributions
2.3.1 Normal Distribution
The parameters for the Normal distribution are the arithmetic mean, µ, and
variance, σ2. The mean is estimated by the sample mean, X , and the variance by the
sample variance, s2, according to the following equations:
X =1
nXi
i=1
n
∑ (2-7)
s 2 =1
nXi − X( )2
i =1
n
∑ (2-8)
17
2.3.2 Lognormal Distribution
The parameters of the Lognormal distribution can be defined as: (1) the
geometric mean, µg, and geometric standard deviation, σg, estimated by ˆ µ g and ˆ σ g ,
respectively; (2) the mean and standard deviation of the logarithm of X, µln(x), and σln(x),
estimated by ˆ µ ln( x) and ˆ σ ln( x ) , respectively; or (3) the arithmetic mean and standard
deviation, µ and σ, estimated by X and s, respectively
The method of matching moments can also be used to estimate the geometric
mean and geometric standard deviation, and the mean and standard deviation of the
logarithm of x. The following transformations between the arithmetic mean and variance,
the geometric mean and geometric standard deviation, and the mean and variance of ln(X)
are based on the method of matching moments (Law and Kelton, 1991):
ˆ µ g = exp ˆ µ LN( )=X
2
s 2 + X2
(2-9)
ˆ σ g = exp ˆ σ LN( )= exp lns2 + X
2
X2
(2-10)
In this study, the geometric mean, µg, and the geometric standard deviation, σg, are used
as the parameters to define the Lognormal distribution.
2.3.3 Weibull Distribution
The parameters of interest for the Weibull distribution are the shape parameter α,
and the scale parameter β, which are estimated by ˆ α and ˆ β , respectively. The
parameters of the Weibull distribution can be estimated using the method of matching
moments by estimating the mean and variance of the data, and solving the following two
equations for ˆ α and ˆ β :
ˆ µ =ˆ β ˆ α
Γ1ˆ α
(2-11)
18
ˆ σ 2 =ˆ β 2
ˆ α 2Γ
2ˆ α
−
1ˆ α
Γ1ˆ α
2
(2-12)
where Γ is the gamma function (Law and Kelton , 1991). Equations (2-11) and (2-12)
can be solved numerically for ˆ α and ˆ β using Newton’s method.
2.3.4 Gamma Distribution
The parameters of interest for the Gamma distribution are the shape parameter α,
and the scale parameter β, where ˆ α is an estimate of α, and ˆ β is an estimate of β. The
method of matching moments can also be used to estimate the shape and scale parameters
of the Gamma distribution. These estimates are determined through the following
relationships between ˆ α and ˆ β , and the sample mean and sample variance, X and s2
(Hahn and Shapiro, 1967):
ˆ α = X 2
s2(2-13)
ˆ β =s2
X (2-14)
2.3.5 Beta Distribution
The Beta distribution has two shape parameters, which can be estimated in a
variety of ways. As indicated in Table 2-1, the shape parameters can be estimated using
the log-likelihood function of the Beta distribution. The shape parameters of the Beta
distribution can also be estimated using the method of matching moments. In the later
approach, the parameters can be estimated through relationships with the sample mean
and sample variance, X and s2 (Hahn and Shapiro, 1967):
ˆ α = X X 1 − X ( )
s2 −1
(2-15)
ˆ β = X −1( ) X 1 − X ( )
s2−1
(2-16)
19
2.4 Evaluation of Goodness of Fit of a Probability Distribution Model
The fitted parametric distributions that are hypothesized to represent the
population from which the available data were drawn may be evaluated for goodness-of-
fit using probability plots and test statistics. It is widely recognized that probability plots
are a subjective method for determining whether or not data contradict an assumed model
based upon visual inspection. However, some statistical methods, such as regression
techniques, chi-squared test, Kolmogorov-Smirnov test, and Anderson-Darling test, can
be used in conjunction with probability plots to provide a numerical indication of the
goodness-of-fit. Hahn and Shapiro (1967), Ang and Tang (1975), D'Agostino and
Stephens (1986), and Cullen and Frey (1999) have given a comprehensive description of
probability plotting and various goodness-of-fit tests. In this study, the empirical
distribution of the actual data set is compared visually with the cumulative probability
functions of the fitted distributions to aid in selecting the probability distribution model
which best describes the observed data. The bootstrap technique described in the next
section can also be used to check the adequacy of the fit.
2.5 Numerical Methods for Generating Samples from Probability Distributions
A combination of computing efficiency and programming simplicity is used as
the criteria for selecting methods for generating random samples from various
distributions using Monte Carlo sampling. The most efficient and simple method for
generating random variables is the method of inversion. This method is always used
when the CDF can be inverted. In many cases however, the inverse CDF cannot be
written in a closed form, and an alternative method is used. Some alternative methods
are the method of composition, the method of convolution, and the acceptance-rejection
method (Law and Kelton, 1991). In the following sections, the methods used in the
AUVEE prototype software to generate random variables for the Normal, Lognormal,
Weibull, Gamma, and Beta distributions will be described.
2.5.1 Normal Distribution
Generation of random variables from a Normal distribution is simplified by the
fact that any Normal distribution can be written in terms of the standard Normal
distribution (with a mean of zero and standard deviation of one):
20
If X ~ N(µ, σ2)
and ′ X ~ N(0,1), (the Standard Normal)
then X = µ + σ ′ X .
where “~” denotes “is distributed as.” Therefore, it is only necessary to generate random
variates from the Standard Normal. The Standard Normal random variates can be
generated using an Acceptance-Rejection method developed by Box and Muller (1958),
and modified by Marsaglia and Bray (1964). In this method, two U(0,1) random variates,
U1 and U2, are used to generate two N(0,1) random variates, X1 and X2. The Box and
Muller method is used to calculate X1 and X2 as follows:
X1 = −2 lnU1 cos 2πU2( )X2 = −2lnU1 sin 2πU2( )
(2-17)
A more efficient version of the Box-Muller method, called the polar method, was
developed by Marsaglia and Bray (1964). The polar method is used in this study. The
algorithm is presented in Law and Kelton (1991) as follows:
1. Generate U1 and U2 as independent and identically distributed (IID) uniform
random variates on the interval [0,1], U(0,1). Let Vi = 2Ui - 1 for i = {1, 2},
and let W = V12 + V2
2.
2. If W > 1, go back to step 1. Otherwise, let Y = (-2ln W( )/ W , ′ X 1 = V1Y, and
′ X 2 = V2Y. Then ′ X 1 and ′ X 2 are IID N(0,1) random variates.
3. X1 = µ + σ ′ X 1 and X2 = µ + σ ′ X 2 so that X1 and X2 are IID N(µ, σ2).
Since two normal random variates are generated with each call of this subroutine,
the procedure really only needs to be implemented on every other call. If U1 and U2 were
truly IID random variables from a U(0,1), then using X1 followed by X2 on subsequent
calls to the subroutine is valid. It has been shown, however, that if U1 and U2 are
sequential pseudo random numbers (as is the case in this implementation) then X1 and X2
will fall on a spiral in (X1, X2) space, rather than being truly IID. In order to ensure that
all normal random variates are truly IID in this implementation, only X1 is used and X2 is
discarded. Another option would be to generate U1 and U2 from separate and
independent pseudo-random number streams.
21
2.5.2 Lognormal Distribution
Lognormal random variates are generated by using a special property of the
Lognormal distribution. Namely, if Y ~ N(µΛΝ, σLN2 ), then eY ~ LN(µΛΝ, σLN
2 ).
Lognormal random variates are therefore generated by the following algorithm:
1. Generate Y ~ N(µΛΝ, σLN2 )
2. X = eY, so that X ~ LN(µΛΝ, σLN2
)
Note that µΛΝ and σLN2 are not the arithmetic mean and variance of the Lognormal
distribution, but rather are the arithmetic mean and variance of the distribution of ln(X).
The transformations provided in Section 2.3 can be used to compute the arithmetic or
geometric mean and standard deviation.
2.5.3 Weibull Distribution
The CDF for the Weibull distribution can be written as
F(x) = 1− e− x β( )α
(2-18)
Random variates, X, from a W(α,β) can therefore be generated directly by the method of
inversion using the inverse CDF
X = F−1(U) = β − ln 1 −U( )[ ]1 α
(2-19)
where U is a random variate from the U(0,1) distribution.
2.5.4 Gamma Distribution
Like the Normal and Lognormal distributions, the Gamma distribution has no
closed form for its CDF or inverse CDF. Therefore the method of inversion is not
feasible for generating random variables. An Acceptance-Rejection method is used in
this study to generate Gamma random variables.
In generating G(α,β) random variables, it is noted that if ′ X ~ G(α,1), then X =
β ′ X ~ G(α,β). Therefore, only the G(α,1) distribution needs to be considered.
Furthermore, a Gamma distribution with α = 1, G(1,β), is simply an Exponential
distribution with a mean of β. Exponential random variables are easily generated by the
method of inversion. Gamma distributions for which α < 1 are shaped significantly
22
different than Gamma distributions for which α > 1, and therefore two distinct
acceptance-rejection algorithms are necessary.
For α < 1, an acceptance-rejection algorithm by Ahrens and Deiter is used in this
study. A description of this method is provided in Law and Kelton (1991), where
following algorithm is also presented:
1. Let b = (e + α)/e
2. Generate U1 ~ U(0,1), and let P = bU1. If P > 1, go to step 4. Otherwise
proceed to step 3
3. Let Y = P1/α, and generate U2 ~ U(0,1). If U2 ≤ e-Y, return X = Y otherwise go
back to step 1.
4. Let Y = -ln[(b - P)/α] and generate U2 ~ U(0,1). If U2 ≤ Yα-1, return X = Y
otherwise go back to step 1.
For α > 1, a modified acceptance-rejection algorithm by Cheng (1977) is used to
sample random variates from a Gamma distribution. Again, a description of the method
is provided in Law and Kelton (1991). Only the algorithm is presented here:
1. Leta = 1 2α −1, b = α − ln 4, q = α +1 a , θ = 4.5, and d = 1 + lnθ.
2. Generate U1 and U2 as IID U(0,1).
3. Let V = aln[U1/(1 - U1)], Y = αeV, Z = (U12U2 ), and W = b + qV - Y.
4. If W + d - θZ ≥ 0, return X = Y. Otherwise, go to step 5.
5. If W ≥ lnZ, return X = Y. Otherwise, go to step 1.
Step 4 in this algorithm is a pretest which, if passed, avoids the logarithm calculation in
the regular acceptance-rejection test in Step 5. Again, other methods exist for calculating
Gamma random variates (especially for the case where α > 1), but this method is
sufficiently efficient, and relatively simple.
2.5.5 Beta Distribution
The method used in this study for generating Beta random variates relies upon a
special property of the Beta distribution. This method uses the fact that the Beta
distribution can be described as a ratio comprised of Gamma distributions. If Y1 ~ G(α,1)
and Y2 ~ G(β,1) and Y1 and Y2 are independent, then X = Y1/(Y1+Y2) ~ B(α,β) (Law and
23
Kelton, 1991). Thus, the methods described for generating random variates from a
Gamma distribution are used here.
2.6 Bootstrap Simulation and Application to Characterization of Variability andUncertainty Using Parametric Distributions
In this section, the bootstrap technique as described in detail by Efron and
Tibshirani (1993) is presented. Bootstrap simulation is a numerical technique originally
developed for the purpose of estimating confidence intervals for statistics based upon
random sampling error. This method has an advantage over analytical methods in that it
can provide solutions for confidence intervals in situations where exact analytical
solutions may be unavailable and in which approximate analytical solutions are
inadequate. For example, in estimating uncertainty in the sample mean, bootstrap
simulation does not require that the original data set be normally distributed, even for
small sample sizes. This advantage over analytical methods that are based on normality
assumptions makes bootstrap simulation a more versatile and robust method for
estimating uncertainty in a sample mean due to sampling error, especially for non-normal
data sets and small sample sizes. In addition, bootstrap simulation can be used to estimate
confidence intervals for other statistics, such as percentiles for entire CDFs.
The bootstrap technique addresses the issue of quantifying the random sampling
error that is introduced by estimating some statistic of interest from a limited number of
randomly sampled data points. The sample data points, x = {x1, x2, …, xn} are assumed to
be a random sample of size n from some unknown probability distribution F. The
parameter of interest, θ, is a characteristic of the distribution of F, θ = f(F), such as the
mean, variance, shape or scale parameter, or any fractile or quantile of the distribution F.
An estimate of θ is the statisticθ̂ , which is determined from the data set, θ̂ = f(x).
Using the data set, x, the distribution F̂ , is defined to be an estimate of the
unknown population distribution F. The distribution F̂ may be defined as either an
empirical distribution or a parametric distribution. The former is the basis for non-
parametric bootstrap, and the latter is the basis for parametric bootstrap (Efron and
Tibshirani, 1993). Non-parametric bootstrap is also commonly referred to as
"resampling." In this project, only situations involving the use of parametric distributions
24
are considered. One of the main shortcomings of resampling of a data set is that the
minimum and maximum values obtained are limited by the minimum and maximum
values within the data set. When only small data sets are available, this can lead to biases
in the representation of a given model input (e.g., failure to consider possible large values
that are not present in the limited data set). The use of parametric distributions is one way
to allow for the possibility that smaller or higher values than those observed in the data
set may occur in the real system being modeled.
A strong assumption in this project is that the data being analyzed are a randomly-
drawn, representative sample. This assumption may not be universally valid in the
context of environmental data. However, it is made for two main reasons: (1) it allows
the use of a powerful set of methods for characterizing both uncertainty and variability;
and (2) an indication of the lower bound for uncertainty can be developed. If data are not
a representative sample then other approaches could be developed to quantify variability
and uncertainty in combination with or instead of bootstrap. Such methods are beyond the
scope of this study.
For the case in which F̂ is defined to be a parametric distribution, the parameters
of the distribution are typically estimated on the basis of the observed data set, x.
Moment planes or knowledge of processes that created the data may be used to help
select an appropriate set of parametric distributions to consider (e.g., Hahn and Shapiro,
1967; Hattis and Burmaster, 1994). In the present study, the methods indicated in
Sections 2.3 (i.e., MLE and MME) are used for parameter estimation.
The bootstrap method addresses uncertainty due to random sampling error by first
assuming that the original data set, x, of sample size n, is a random sample from the
distribution F̂ , and then repeatedly asking the question: What if the data set had been a
different set of n random values from the same distribution F̂ ? This question is answered
by repeatedly generating what are called “bootstrap samples.” A bootstrap sample, x*, is
defined as a random sample of size n taken from the distribution, F̂ . Bootstrap samples
may be simulated using random Monte Carlo simulation. A large number, B, of
independent bootstrap samples (x*1, x*2, … x*B) are selected from the distribution F̂ .
From each of the B bootstrap samples, a new statistic *θ̂ , is computed such that:
25
)(fˆ i*i* x=θ for i =1, 2, …, B (2-20)
Each *θ̂ is referred to as a bootstrap replicate of θ̂ .
The bootstrap replications ( B*2*1* ˆ,...,ˆ,ˆ θθθ ) are each independent realizations of
an estimate of the parameter θ. The dispersion of values of the bootstrap replications
reflects the uncertainty in the sample estimate of the unknown parameter, θ , attributable
to random sampling error. The bootstrap replicate values describe an estimate of the
sampling distribution of the statistic. Since a statistic is estimated from randomly drawn
values, it is itself a random variable. The number of bootstrap replications necessary to
reasonably approximate the true sampling distribution of the statistic depends upon the
statistic being estimated. For, example, according to Efron and Tibshirani (1993), to
compute the standard error of the mean (the original intent of the bootstrap technique), B
= 200 is generally enough and B = 25 is often sufficient. However, for computing
confidence intervals or estimating percentiles of sampling distributions, Efron and
Tibshirani (1993) suggest B = 1000. In examples for computing confidence intervals
given in Efron and Tibshirani (1993), the number of bootstrap replications ranges
between B = 1,000 and B = 2,000.
There are a number of variants of the parametric bootstrap method. The one
employed here is known as the percentile, or bootstrap-p, method. Bootstrap can be used
for estimating a confidence interval that has a (1-2α) probability of enclosing the true
value of a parameter, θ. The upper and lower bounds of this confidence interval are
determined by ordering the B bootstrap replicates of *θ̂ , ( B*2*1* ˆ,...,ˆ,ˆ θθθ ). Given these
ordered statistics, the 100αth percentile (the lower bound of the confidence interval) is
the B•αth largest value, αθ •B*ˆ , and the 100(1-α)th largest value, )1(B*ˆ αθ −• . For example,
for B =1,000 and α = 0.05, the 90 % confidence interval for some parameter, θ, is given
by:
[ ˆ θ *B•α , ˆ θ *B•(1−α ) ] =[ ˆ θ *50, ˆ θ *950 ] (2-21)
where, ˆ θ *50 and ˆ θ *950
are simply the 50th and 950th values in the ordered set if the
bootstrap statistics.
26
2.7 Two-Dimensional Simulation of Uncertain Frequency Distributions
To simulate uncertain frequency distributions, a two-dimensional simulation
approach based upon that employed by Frey and Rhodes (1996) is used. The overall
approach is illustrated in the simplified flow diagram in Figure 3-2. For a given input to
a model, uncertainty and variability must be characterized. Bootstrap simulation is used
to simulate the uncertainty in the parameters of a frequency distribution, F̂ , that has been
fitted to a data set of sample size n.
A total of B bootstrap samples of sample size n are simulated. For each bootstrap
sample, a new distribution is fitted and a bootstrap replication of the distribution
parameters is calculated. The bootstrap simulation produces paired parameter estimates.
In the case of censored data sets, the detection limit is imposed on each of the B bootstrap
samples before the parameters are estimated. These multivariate sampling distributions of
the parameters represent the uncertainty in the distribution parameters. In the two-
dimensional simulation, a total of q different frequency distributions are simulated, where
q = B = 500 in most cases presented here. We select B= 500 mainly because of
limitations on computer memory usage. Each alternative frequency distribution is based
upon a different set of bootstrap replicate distribution parameters. For each alternative
frequency distribution, a total of p random samples are simulated to represent one
possible realization of variability within the population. In this case, p = 500. Thus, a
total of 250,000 samples are generated, representing 500 samples from each of 500
alternative frequency distributions. For each realization of uncertainty, the samples are
sorted to represent cumulative distribution functions. Thus, there are 500 values for any
given statistic (e.g., mean, variance, 95th percentile of variability) which can be used to
construct sampling distributions for each statistic.
27
Specify Probability Distribution F
For i = 1 to B(where B = q)
Generate n Random Samples fromF to form one Bootstrap Sample
Fit a Distribution to each BootstrapSample by Estimating a Bootstrap
Replication of the Distribution Parameters
Characterize Sampling DistributionsBased upon Bootstrap Replications of
Distribution Parameters
For nU = 1to nU = q
Select One Pair of DistributionParameters to Represent One
Possible Distribution for Variability
Simulate p Random Samples fromthe Specified Distribution to
Represent Variability
Analyze Results to Characterize:- Confidence Intervals for CDF- Sampling Distributions for M ean, Variance, and Selected Percentiles
BootstrapSimulation
Two-DimensionalSimulation ofUncertainty andVariability
Analysis andReporting
Figure 2-2. Simplified Flow Diagram for Bootstrap Simulation and Two-DimensionalSimulation of Uncertainty and Variability. (Key: B = Number of Bootstrap
Replications, q = Sample Size Used for Uncertainty, p = Sample Size Used ofVariability.) (Frey and Rhodes, 1998)
28
2.8 Propagating Distributions Through a Model
In developing a probabilistic emission inventory, variability in emission and
activity factor data are quantified using parametric probability distribution models. The
uncertainty in the mean values of the emission and activity factors are estimated using
bootstrap simulation. The uncertainty in the emission inventory is estimated by using
Monte Carlo simulation to propagate the uncertainties in emission estimates for
individual emission sources within the inventory when estimating the total emission
inventory. The specific methodology for calculation of the probabilistic emission
inventory is described in more detail in Section 5.4.
2.9 Analyzing Probabilistic Emission Inventory Results
The results of a probabilistic emission inventory include probability distributions
for uncertainty in total emissions, probability distributions for uncertainty in emissions
from specific types of sources, and identification of key sources of uncertainty. These
types of results are discussed in more detail in Chapter 5.
2.10 Summary
In this chapter, key elements of the quantitative methodology for characterizing
variability and uncertainty in the inputs to an emission inventory, and for estimating
uncertainty in the total inventory, have been presented. In the next chapter, the data used
for the specific case study is discussed. The prototype software used to implement the
method described in this chapter, using the data described in the next chapter, is
presented in Chapter 4. Chapter 5 includes a detailed case study illustrating the
application of the methods described here to the example data using the prototype
software tool.
29
3.0 DEVELOPMENT OF INPUT DATA FOR UTILITY NOx
EMISSIONS CASE STUDIES
The methodology for probabilistic analysis, introduced in Chapter 2, is applied to
a case study of variability and uncertainty in electric utility coal-fired power plant NOx
emissions. The data used for the case study is based upon Continuous Emission
Monitoring (CEM) for individual power plant units obtained through the U.S.
Environmental Protection Agency. In this chapter, the data are described, including the
source of the data and the content of the data.
3.1 Origin and Description of Utility NOx Emissions Data
The utility NOx emissions data used in the case studies of this project are from the
"Preliminary Summary Emissions Reports" of the Acid Rain Program of the U.S.
Environmental Protection Agency (EPA). These files contain summary emissions
information for electric utilities regulated by the EPA's Acid Rain Program. Each power
plant unit subject to the Acid Rain Program regulations is required to report hourly data,
describing emissions and operation, to EPA at the end of each calendar quarter. EPA
compiles and releases preliminary summary data in the form of "Preliminary Summary
Emissions Reports." These reports can be downloaded from the following web site:
http://www.epa.gov/acidrain/etsdata.html
In this project, only the quarterly data files are used.
Each of the reports lists data at the stack and/or unit level depending on how the
data are monitored and reported by the utility. The hierarchy of the data organization is:
State (e.g., North Carolina)
Holding company (utility, e.g., Carolina Power and Light)
Name of the plants (ORISPL identification number)
Unit / Stack identification
Each unit or stack can be uniquely identified by the combination of the ORISPL
identification number, which is unique to a single power plant, and the Unit/Stack ID. A
single power plant typically has multiple units and or multiple stacks. For each unit or
stack, the following information is provided and used in this study: (1) boiler type (e.g.,
wall-fired, tangential fired); (2) primary fuel (e.g., coal); (3) NOx control technology
30
(e.g., uncontrolled or specified control technology); (4) total operation time; (5) quarterly
gross unit load (MW); (6) total quarterly heat input (million BTUs), and (7) average
hourly NOx emission rate (lb NOx as NO2/106 BTU of fuel input). There are also other
data fields in the EPA "Preliminary Summary Emissions Reports" that are not used in this
work. Such fields include, for example, information regarding sulfur dioxide and carbon
dioxide emissions.
There are three types of boiler and stack configurations that are included in the
EPA utility NOx emissions databases: (1) simple; (2) common; and (3) multiple. In the
simple configuration, there is one stack uniquely associated with just one power plant
unit. For example, if the power plant has five separate boilers (units), then there are also
five separate stacks, with one stack connected to only one unit. In the common
configuration, several units may deliver flue gas to one common stack. In the multiple
configuration, a single unit may deliver flue gas to two or more stacks. The data
configuration reflects the power plant design and influences the emissions monitoring
approach. Differences in configuration are reflected in the “Unit/Stack ID” field, with a
notation of “CS” for a common stack and “MS” for multiple stacks. A more detailed
description of the original data, is available on the web page for "Description of
Preliminary Summary Emissions Reports" of the EPA's Acid Rain Program at the
following URL:
http://www.epa.gov/acidrain/ets/etsrpts.html
3.2 Development of Data Files for Selected Averaging Times
The utility NOx emissions data files available from EPA are reported on a
quarterly (3-month) average basis. In the case studies of this project, two averaging
times are considered: (1) 6-month; and (2) 12-month. The purpose of the 6-month
averaging time is to characterize emissions that include the "ozone season." The purpose
of the 12-month averaging time is to be able to characterize annual emissions for
emissions budgeting and other purposes.
To develop the data necessary for these case studies requires combining data from
two or more quarters and calculation of activity and emissions for the desired averaging
times. The 6-month time period is intended to be inclusive of summer months.
Therefore, the 6-month averages are based upon combining data from the 2nd and 3rd
31
quarters of the year, including the months from April through September. The 12-month
averages are based upon the entire year, and include the months from January through
December. At the time that the data collection effort was made, quarterly data were
available for the 1st quarter of 1997 through the 2nd quarter of 1999. Therefore, complete
datasets of four quarters were available only for 1997 and 1998. Furthermore, data sets
needed to characterize the 6-month period inclusive of the summer were available only
for 1997 and 1998.
In order to combined data from multiple quarters into a single data base, "macros"
were developed using Visual Basic in Microsoft Excel™. The major steps in the data
combination process are described here:
Step 1: Create a List of All Units in All Quarterly Databases
The first step is to create a complete listing of all of the units that appear in any of
the 10 quarterly data files obtained from the EPA web site, and to save this listing as a
separate file. The resulting unit listing file contains a general description of all of the
units. Specific information included in the unit listing file includes the plant
identification (ORISPL number), unit/stack ID, state, and region. Note that this file does
not include information regarding emissions, control technology, and operation data such
as operating time and gross load. Those data are processed in Step 2. Step 1 is done by
the macro named "Collect_All( )."
Step 2: Create a Single CEMS Data File For All Available Quarterly Data
In the second step, all of the available relevant data for each unit is read from the
individual quarterly data files obtained from EPA and written to a new combined data file
referred to as "All." This work is based on the file generated by Step 1. Step 2 is done by
a macro named "Combine( )." Based on the power plant unit list generated by Step 1, the
macro searches all of the ten available individual quarterly databases, and creates a new
data based with separate columns for each quarter that includes emissions data, control
technology information, and operation data. This process is repeated for every unit in the
data base created in Step 1. Thus, every unit or stack is reflected as a record in the new
table, which includes all the ten quarters of emissions data. In some cases, the control
technology may change from one quarter to another because of retrofits of control
technologies to an existing unit.
32
Step 3: Record the Maximum Gross Capacity of Each Unit
In order to characterize plant activity in the case studies, it is desirable to be able
to calculate a "capacity factor." The capacity factor is the ratio of the power plant unit
actual output with respect to the maximum possible output for a given time period. Data
are provided in the EPA databases regarding the actual power plant unit output.
However, data are not contained in the Acid Rain Program databases regarding the
maximum gross load of the units. This information was obtained in a separate database
provided by EPA's Office of Air Quality Planning and Standards. Therefore, it is
necessary to merge the plant capacity database with the quarterly emissions databases in
order to be able to calculate capacity factors. The maximum gross load database
includes the ORISPL and unit/stack IDs. Therefore, by matching plant and unit/stack IDs
between the combined database of Step 2 with the maximum gross load database, a new
database can be created that includes both sets of information. Thus, Step 3 is
accomplished by a macro that searches the maximum gross load data based and inserts
this information into the combined database of Step 2.
Final Combined Database
After completing Steps 1, 2, and 3, a new database has been created which
includes all 10 available quarters of information regarding NOx emissions, NOx control
technology, and operation data for all units or stacks from the 1st quarter of 1997 to the
2nd quarter of 1999. This new database is referred to as "EPA_all" and is in the form of a
Microsoft Excel™ spreadsheet.
3.3 Data Screening and Quality Assurance
In the "EPA-all" data base described in the previous section, each power plant
unit or stack is a unique record. However, not every record can be used, because within
some records there are missing fields of data. For example, for some units or stacks,
there may not be information regarding the maximum gross capacity, the control
technology, or the emission rate. Without any one of these pieces of information, it is not
possible to completely characterize both the activity factor and emission factor for that
particular unit or stack. Thus, records that are incomplete were screened out of the
database to create a "clean" database comprised only of complete records. Furthermore,
information not needed for this study, such as for sulfur dioxide emissions, also were
33
screened out of the data base. To accomplish these screening activities, a three-step
process was used:
Step 1: Remove Unnecessary Data Fields
Unnecessary data fields were removed from the database. These fields include:
SO2 Control; Total Quarterly SO2 Emissions; and Total Quarterly CO2 Emissions. The
remaining fields included in the database include: ORISPL identification number;
Unit/Stack ID; Primary Fuel; Boiler Type; Maximum MW gross load; NOx Control
Technology; Operating Time; Actual Gross Load; Total Quarterly Heat Input; and
Average Hourly NOx Emission Rate.
Step 2: Identify Units in Which Control Technology Changes
A notation was added to units for which the NOx control technology changes from
one quarter to another because of retrofits or modifications to the unit. Since the activity
and emissions will be classified by boiler type and control technology, it is important not
to combine data for different control technologies even though the data are for the same
unit. By noting those units which have changes in control technology, it is possible to
avoid misclassification of activity and emissions
Step 3: Separate Incomplete and Mixed Records from the Main Database
The purpose of this step is to remove from the main database all records that are
missing critical data fields are that have changes in control technology from one quarter
to another. The resulting database, therefore, contains complete records and no
ambiguity regarding the control technology employed for a given unit. The records with
missing data were saved to another databased “Missing Data”. The records with changes
in control technology were saved to a separate data based named “Mixed Data.” Missing
data and mixed data cannot be used in the statistical analysis, but they are kept the
separate tables for a possible later use.
Step 4: Common and Multiple Stack Records
For a common stack configuration, data are typically reported only at the stack.
Therefore, in these cases, it is often not possible to distinguish emissions for a single
unit. Instead, only the average emissions for those units that feed into the common stack
can be calculated. For multiple stack configurations, it is often possible to estimate the
activity and emissions for an individual unit. In this step, data for common or multiple
34
stack configurations are recorded into the database so that duplicate records are
eliminated.
3.4 The Structure of the Final Database
After the data combination and screening processes have been completed, the
final database, named "EPA_NOx_clean.xls," is ready for statistical analysis. In this
database, each record represents a unit or stack. Each record contains the following
information:
Unit/Stack Identification (Unit ID and ORISPL)
General Information (State, Region)
Technology Group (Boiler Type, NOx Control Technology)
Operation Data (Capacity, Operating Time)
Ten Quarters of NOx Emission Data
This database is used as a basis for the internal database of the prototype AUVEE
software, as described in Chapter 4.
3.5 Calculation of Emission Factors and Activity Factors
In developing an emission inventory, both activity and emission factors are
needed. An emission factor characterizes the amount of emissions produced per unit of
activity. For example, for a power plant, emission factors are often reported as mass of
pollutant produced per unit of fuel consumed. The activity factor, therefore, is the
amount of fuel consumed. To estimate fuel consumption for a power plant, one method
is to use the power plant electrical generation, which is accurately measured, and the
power plant efficiency in order to calculate the fuel input. Power plant efficiency is
typically reported as a "heat rate", which is the ratio of fuel input with respect to
electricity generation, in units of BTU of fuel input per kWh of electricity generated.
Power plant load is often summarized using the previously defined capacity factor.
Four quantities are calculated from the combined database developed in this
project. These quantities are: (1) unit/stack heat rate (BTU/kWh); (2) unit/stack capacity
factor (actual kWh generated/maximum possible kWh); (3) NOx emission rate on a fuel
input basis (g/GJ); and NOx emission rate on an energy output basis (g/GJ). Data from
the final database are used to calculate the average emission factors and activity factors
35
for each unit or stack. The averaging time includes 12-month averages and 6-month
averages. The factors are calculated as follows for the 12-month and 6-month averaging
times, respectively:
12-month Averaging Time
1000
12
4
1
4
1 ×
=
∑
∑
=
=
i
th
i
th
]MWh[QuarteritheforLoadUnitQuarterlyTotal
]BTU[QuarteritheforInputHeatQuarterlyTotal
]kWh
BTU[RateHeatAveragemonth
×
=
∑=
hours]MW[LoadMaximum
]MWh[QuarteritheforLoadUnitQuarterlyTotal
FactorCapacityAveragemonth
i
th
8760
124
1
−−×
×
=
∑
∑
=
=
]GJlb
BTUg[
]BTU[QuarteritheforInputHeatQuarterlyTotal
])BTU[QuarteritheforInputHeatQuarterlyTotal
]BTU
lb[QuarteritheforRateEmissionNOxAverageQuarterly(
]GJ
g[BasisInputFuelonRateEmissionNOxAverageMonth
i
th
i th
th
430
10
12
4
1
4
1
6
−−×
×
=
∑
∑
=
=
]GJlb
MWhg[
]MWh[QuarteritheforLoadUnitQuarterlyTotal
)]BTU[QuarteritheforInputHeatQuarterlyTotal
]BTU
lb[QuarteritheforRateEmissionNOxAverageQuarterly(
]GJ
g[BasisOutputEnergyonRateEmissionNOxAverageMonth
i
th
i th
th
126
10
12
4
1
4
1
6
36
6-month Averaging Time
1000
6
4
1
3
2 ×
=
∑
∑
=
=
i
th
i
th
]MWh[QuarteritheforLoadUnitQuarterlyTotal
]BTU[QuarteritheforInputHeatQuarterlyTotal
]kWh
BTU[RateHeatAveragemonth
×
=
∑=
hours])MW[LoadMaximum(
]MWh[QuarteritheforLoadUnitQuarterlyTotal
FactorCapacityAveragemonth
i
th
4380
63
2
−−×
×
=
∑
∑
=
=
]GJlb
BTUg[
]BTU[QuarteritheforInputHeatQuarterlyTotal
])BTU[QuarteritheforInputHeatQuarterlyTotal
]BTU
lb[QuarteritheforRateEmissionNOxAverageQuarterly(
]GJ
g[BasisInputFuelonRateEmissionNOxAverageMonth
i
th
i th
th
430
10
6
3
2
3
2
6
−−
××
=
∑
∑
=
=
]GJlb
MWhg[
]MWh[QuarteritheforLoadUnitQuarterlyTotal
])BTU[QuarteritheforInputHeatQuarterlyTotal
]BTU
lb[QuarteritheforRateEmissionNOxAverageQuarterly(
]GJ
g[BasisOutputEnergyonRateEmissionNOxAverageMonth
i
th
i th
th
126
10
6
3
2
3
2
6
The emissions and activity data are calculated for selected technology groups.
Four of the technology groups were selected based upon the most prevalent types of
units in the data base. These include: (1) dry bottom, wall-fired boilers with no NOx
control; (2) dry bottom, wall-fired boilers with low NOx burners (LNB); (3) tangential-
fired boilers no NOx controls; and (4) tangential-fired boilers with low NOx burners and
overfire air option 1, referred to as LNC1. Table 3-1 lists these technology groups and
the number of units included in the 6-month and 12-month averages. Typically, the
37
number of units is similar for the two averages. However, there are sometimes fewer
units for which 12-month averages were calculated compared to the number for which 6-
month averages were calculated. This is because a 12-month average cannot be
calculated if data are missing for either the 1st or 4th quarters, even though data may be
available for the 2nd and 3rd quarters. The latter are all that are needed for the six month
average calculation. For each of the first four technology groups, between 36 and 136
data points are available.
In addition, one other technology group was selected that has a small sample size.
The reason for selecting this group was to demonstrate that the probabilistic method for
developing estimates of variability and uncertainty in emission inventories is able to deal
with small data sets. The category for dry bottom, turbo-fired boilers with overfire air
has only six data points and was selected for inclusion in the case study.
Table 3-1. Summary of Data for Use in Case Studies.
Technology
Number of UnitsConsidered for 6-month Average
Number of UnitsConsidered for 12-month
AverageDry Bottom Wall-fired Boilers withNo NOx Controls (DB/U)
87 84
Dry Bottom Wall-fired Boilers withLow NOx Burners (DB/LNB)
98 98
Tangential Fired Boilers with NoNOx Controls (T/U)
136 134
Tangential Fired Boilers Using LowNOx Burners & Overfire Air Option1 (T/LNC1)
41 36
Dry Bottom Turbo-Fired Boilerswith Overfire Air (DTF/OFA)
6 6
3.6 Evaluation of Possible Statistical Dependencies in the Database
In order to simplify the development of a database for use in case studies, possible
statistical dependencies within the database were evaluated. To simplify the database as
much as possible, it is desirable to be able to select data for one representative year. For
both 1997 and 1998, there are four quarters of data. There were only two quarters of data
available for 1999 at the time that this work was done. Therefore, the data for 1997 and
1998 were compared to identify similarities and differences between them. In addition,
possible dependencies between activity and emission factors were evaluated.
38
3.6.1 Comparison of 1997 and 1998 Data
In comparing 1997 and 1998 data, 12-month averages were used for both years
and were displayed as scatter plots. Each data point in the scatter plot represents an
individual unit or stack. It is expected that there will be some variation in emissions and
activity from one year to another for a given unit. However, on average, if there are no
systematic trends overall, there will be some random scattering of data above and below a
"reference line" that has a slope of one.
Figure 3-1 displays a scatter plot of the 12-month average 1998 emission rate for
each unit versus the 12-month average of the 1997 emission rate of each unit. Most of
the data points fall close to the "reference line" For example, units that had average
emissions of approximately 300 g/GJ in 1997 also had average emissions of
approximately 300 g/GJ in 1998. There appear to be a relatively small number of units
that have noticeably lower emissions in 1998 than in 1997. For example, there is a unit
that had an emission rate of approximately 500 g/GJ in 1997 but only 200 g/GJ in 1998.
It is possible that this unit may have had some type of change in configuration or
operation that is not reflected in the available data within the database. However, with
these relatively few exceptions, the overall trend is good agreement between the 1997 and
1998 emission rates. This comparison indicates that either year might serve as a
representative basis for characterizing emissions.
A similar graph is shown in Figure 3-2 regarding a scatterplot of 1998 capacity
factor versus 1997 capacity factor for individual units. While there is considerably more
relative scatter, on average the capacity factors are similar between the two years.
39
0
150
300
450
600
0 150 300 450 600NOx Emission Rate (gram/GJ fuel input) (1997)
NO
x E
mis
sion
Rat
e(19
98)
Figure 3-1. Scatter plot of 6-month NOx Emission Rate of 1997 and 1998
(No. of Data=390)
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Capacity Factor (1997)
Cap
acit
y F
acto
r (1
998)
Figure 3-2. Scatter plot of 12-month Capacity Factor of 1997 and 1998
(No. of Data=390)
40
3.6.2 Evaluation of Possible Dependencies Between Activity and Emission
Factors
The key purpose of this analyses is to identify whether it is reasonable to treat
heat rate, capacity factors, and emission factors (on a fuel input basis) as statistically
independent. Statistical independence would allow for a simpler approach to the
probabilistic simulation of an emission inventory.
To evaluate possible dependencies among variables, scatter plots were developed
of the data for one variable versus another variable. Figures 3-3 through 3-5 show the
scatter plots of: (1) heat rate versus capacity factor; (2) emission rate versus capacity
factor; and (3) heat rate versus emission rate, respectively. The scatter plots are based on
data for Tangential-Fired Boilers with Low NOx Burners and Overfire Option 1 for a 6-
month averaging time. These results are typical of other technology groups.
In Figure 3-3, it appears that there is no systematic trend of changes in the average
heat rate with respect to capacity factor. While there is considerable variation in heat
rate, the range of variation is not significantly dependent on the capacity factor.
Therefore, it appears that these two quantities are not statistically dependent upon each
other in any significant way. Thus, for purposes of developing an emission inventory, we
assume that these two quantities vary in a statistically independent manner.
In Figure 3-4, it appears that there is not a systematic trend of emission rate with
respect to capacity factor. In other words, the average value of the emission rate does not
depend on the value of the capacity factor. Furthermore, there is variability in the
emission rate for various capacity factors. Because of the limited amount of data, it is not
possible to make a very quantitative assessment of the statistical dependence between
emission rate and capacity factor. However, from a qualitative perspective, it appears
that these two quantities are approximately statistically independent of each other. With
statistical samples of data, one should not place too much emphasis on patterns that
depend on a small number of data points. For example, the one relatively high emission
rate shown in Figure 3-4 is not sufficient evidence, by itself, to indicate that there is more
variability in emissions at high capacity factors than at low capacity factors.
41
6000
8000
10000
12000
14000
0 0.2 0.4 0.6 0.8 1
Capacity Factor
Hea
t Rat
e (B
TU
/kw
h)
Figure 3-3. Scatter Plot for 6-month Average Heat Rate versus 6-month AverageCapacity Factor for Tangential-Fired Boilers Using Low NOx Burners and Overfire Air
Option 1. (n=41)
0
100
200
300
400
0 0.2 0.4 0.6 0.8 1
Capacity Factor
NO
x E
mis
sion
Rat
e
(gra
m/G
J
Figure 3-4. Scatter Plot for 6-month Average NOx Emission Rate versus 6-monthAverage Capacity Factor for Tangential-Fired Boilers Using Low NOx Burners and
Overfire Air Option 1. (n=41)
42
6000
8000
10000
12000
14000
0 100 200 300 400
NOx Emission Rate (gram/GJ fuel input)
Hea
t Rat
e (B
TU
/kw
h)
Figure 3-5. Scatter Plot for 6-month Average NOx Emission Rate versus 6-monthAverage Heat Rate for Tangential-Fired Boilers Using Low NOx Burners and Overfire
Air Option 1. (n=41)
In Figure 3-5, it appears that there is not a statistically significant relationship
between heat rate and emission rate. Most of the data are in a cluster with heat rates
between approximately 9,000 and 12,000 BTU/kWh and emission rates between
approximately 120 g/GJ and 200 g/GJ. The data points indicating substantially higher
and lower emissions do not appear to have heat rates any different than those for the data
points within the central cluster. Therefore, there is no apparent trend of emissions with
respect to heat rate, and for modeling purposes we will treat these two quantities as
statistically independent.
Similar results were obtained in an earlier study by Frey et al. (1999).
3.7 Statistical Summary of the Database
The final set of data for both activity and emission factors for the five selected
technology groups are summarized in Tables 3-2 and 3-3 for the 6-month and 12-month
averaging times, respectively. For each technology group, the three factors required to
calculate the emission inventory are shown. The average value of each of these factors is
provided. The inter-unit variability in these factors is indicated by the standard deviation.
For example, for the dry bottom wall-fired boilers with no NOx control, the heat rate has
43
a mean value of 11,190 BTU/kWh and a standard deviation of 1,440 BTU/kWh based
upon a six month average, and a mean value of 11,150 BTU/kWh and a standard
deviation of 1,450 BTU/kWh based upon a 12-month average. Although the values are
similar for the 6-month and 12-month averages, they are not identical. This is because
the 12-month average differs from the 6-month average in that it includes two additional
quarters of data. However, differences between the 12-month and 6-month averages are
within statistical sampling error.
Table 3-2. Statistical Summary of the 1998 6-month Database for Five SelectedTechnology Groups
Technology VariablesaNumber of
Data PointsMean
Standard
Deviation
Heat Rate 87 11,190 1,440
Capacity Factor 87 0.59 0.18
Dry Bottom Wall-Fired
Boilers with No NOx
Controls NOx Emission Rate 87 291 90
Heat Rate 98 10,570 800
Capacity Factor 98 0.69 0.14
Dry Bottom Wall-fired
Boilers with Low NOx
Burners NOx Emission Rate 98 176 42
Heat Rate 136 10,860 1,340
Capacity Factor 136 0.62 0.15
Tangential Fired
Boilers with No NOx
Controls NOx Emission Rate 136 196 55
Heat Rate 41 10,590 850
Capacity Factor 41 0.69 0.14
Tangential Fired Boilers
Using Low NOx Burners &
Overfire Air Option 1 NOx Emission Rate 41 163 37
Heat Rate 6 10,420 910
Capacity Factor 6 0.71 0.09
Dry Bottom Turbo-Fired
Boilers with Overfire
Air NOx Emission Rate 6 191 19aUnits: Heat rate (BTU/kWh); Capacity Factor (actual kWh/maximum possible kWh);
and NOx Emission Rate (g NOx as NO2/GJ of fuel input)
44
Table 3-3. Statistical Summary of the 1998 12-month Database for Five SelectedTechnology Groups
Technology VariablesaNumber of
Data PointsMean
Standard
Deviation
Heat Rate 84 11,150 1,450
Capacity Factor 84 0.53 0.19
Dry Bottom Wall-fired
Boilers with No NOx
Controls NOx Emission Rate 84 293 83
Heat Rate 98 10,610 890
Capacity Factor 98 0.67 0.14
Dry Bottom Wall-fired
Boilers with Low NOx
Burners NOx Emission Rate 98 177 41
Heat Rate 134 10,780 1,290
Capacity Factor 134 0.56 0.18
Tangential Fired
Boilers with No NOx
Controls NOx Emission Rate 134 198 54
Heat Rate 36 10,730 790
Capacity Factor 36 0.65 0.20
Tangential Fired Boilers
Using Low NOx Burners &
Overfire Air Option 1 NOx Emission Rate 36 161 37
Heat Rate 6 10,360 900
Capacity Factor 6 0.66 0.07
Dry Bottom Turbo-Fired
Boilers with Overfire
Air NOx Emission Rate 6 191 17aUnits: Heat rate (BTU/kWh); Capacity Factor (actual kWh/maximum possible kWh);
and NOx Emission Rate (g NOx as NO2/GJ of fuel input)
One measure of the variability in a data set is the ratio of the standard deviation to
the mean, referred to as the coefficient of variation or relative standard deviation. For
example, for the dry bottom wall-fired boilers with no NOx control, the coefficent of
variation for the 6-month average data is [1,440 BTU/kWh]/[11,190 BTU/kWh] = 0.129.
This indicates that the standard deviation is 12.9 percent of the mean value. In contrast,
the coefficient of variation for the emission factor for the same technology group and
averaging time is 0.309, indicating that there is relatively more variation in emission rate
than in heat rate. These types of statistical summaries provide insight regarding which
quantities in the data base have more inter-unit variability than others.
45
The data described in this chapter are used as input to a computer model that
enables calculation of probabilistic emission inventories. The implementation of the
computer model is described in the next chapter.
47
4.0 AUVEE SYSTEM DEVELOPMENT AND IMPLEMENTATION
The probabilistic methodology for emission inventory estimation was
implemented in a prototype software, AUVEE. In this chapter, we introduce the
functional design of AUVEE, the main modules and databases, and the relationships
among the modules and databases.
4.1 General Structure of the AUVEE Prototype Software
In AUVEE, the user sets up a project. The project contains information on the
choice of an internal emission factor and activity factors database, project name, project
comments, and user data regarding the number of power plant units included in the
inventory, the boiler and emissions control technology for each unit, and the capacity of
each unit.
Figure 4-1 shows the conceptual design of AUVEE. AUVEE is composed of
three databases, which include an internal database, a user input database and an interim
database. In addition, AUVEE includes four main modules: (1) fitting distributions; (2)
characterizing uncertainty; (3) calculating emission inventories; and (4) user data input.
AUVEE features an interactive Graphical User Interface (GUI).
4.2 Databases in the AUVEE Prototype Software
The internal database for AUVEE includes emission and activity factors obtained
from CEMS data. The development of the internal database was described in detail in
Chapter 3. The user may select either a 6-month average or a 12-month average database
as the basis for developing either a 6-month or 12-month emission inventory,
respectively. The internal database cannot be modified by the user in the prototype
version of the software.
The user input database stores data that the user provides regarding the number of
power plant units in the emission inventory that the user wants to calculate, the boiler and
emission control technology for each unit, and the capacity of each unit. This database
can be edited by the user via the user data input module shown in Figure 4-1.
48
Figure 4-1. Conceptual Design of the Analysis of Uncertainty and Variability inEmissions Estimation (AUVEE) Prototype Software System
The interim database in AUVEE is used to store the results from the fitting
distribution module and to store project information. The interim database provides fitted
distribution information needed by the uncertainty analysis and emission inventory
modules shown in Figure 4-1. A default interim database is provided so that the user can
proceed to calculate emission inventory results even without making a new selection of
parametric distributions to represent each input to the emission inventory. The advantage
of the interim database is that it can be used to store default assumptions and can be
modified by the user to save project-specific assumptions. The interim database also
allows for data to flow between modules of the software.
4.3 Modules in the AUVEE Prototype Software
In this section, each of the four modules indicated in Figure 4-1 are described. In
addition, the GUI is also briefly described.
InternalDatabase
InterimDatabase
User InputDatabase
FittingDistributionModule
UncertaintyAnalysisModule
EmissionInventoryModule
User DataInput Module
InteractiveGraphic UserInterfaceModule
TabularOutput
CalculationResultGraphicOutput
File Input andOutput
Mean …. Alpha BetaHeatRate 11000 …. ….. ….C.F. …… …. …… …..….. …. …. ….. …..
95 percent90 percent
Data Set
Confidence Interval
50 percent
Fitted Beta Distribution
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
0.0 0.2 0.4 0.6 0.8 1.0
Capacity Factor
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Prob
abili
ty
7000 8000 9000 10000 11000 13000 1400012000
Heat Rate (BTU/GJ Fuel Input)
Data Set (n=41)
Fitted Lognormal Distribution
49
4.3.1 Fitting Distribution Model
The fitting distribution module implements all calculations for fitting parametric
distributions to emission factor and activity factor data. This module provides graphs
comparing fitted distributions to the data, allowing the user to evaluate the goodness of fit
of parametric distributions fitted to datasets from the internal database. The user has the
option, via a pull-down menu, to select alternative parametric distributions for fit to the
data. When the user exits the fitting distribution model, the current set of fitted
distributions are saved to the interim database for use by other modules in the program.
4.3.2 Characterizing Uncertainty Module
The characterizing uncertainty module implements the function of characterizing
uncertainty in emission factors or activity factors based upon the internal database and
based upon the number of units of each technology group that are in the internal database.
The characterizing uncertainty module uses data from the interim database to get
distribution information including distribution type and the parameters of the fitted
distributions for emission and activity factors. Uncertainty estimates of the mean
emission and activity factors, and other statistics, are calculated using the numerical
method of bootstrap simulation. The results of the uncertainty analysis are displayed in
the GUI. Because this module uses data from the internal database, which may contain a
relatively large number of power plant units compared to an individual state emission
inventory, the estimates of uncertainty in the mean and in other statistics are typically a
lower bound on the range of uncertainty in the same statistic applicable to an emission
inventory that includes a smaller number of power plant units.
4.3.3 Emission Inventory Module
The emission inventory module has the following functions: (1) it allows the user
to visit the user database and append, modify or delete user input data; (2) it characterizes
the uncertainty in emission factors and activity factors based on user project data; (3) it
calculates uncertainty in the emission inventory; and (4) it calculates the key sources of
uncertainty from among the different technology groups. It is via the emission inventory
module that the user has access to the user data input module. The estimates of
uncertainty in the emission inventory module are based upon the number of power plant
50
units of each technology group specified by the user. For example, although there may
be 36 power plant units of a given type in the internal database, the user may have only
10 units of that type in the emission inventory of interest. The uncertainty in the
emission and activity factors for that technology group will be estimated based upon a
sample size of 10, not 36.
4.3.4 User Data Input Module
The user data input module is packaged with the emission inventory module. The
user data input module is the portion of the software that enables the user to add, modify,
or delete information in the user database.
4.3.5 Graphical User Interface (GUI)
The GUI is actually a general control module in AUVEE, and it makes all of the
independent modules, platforms and databases work together. In addition, the GUI is a
bridge which links user input to internal implementation within AUVEE, and provides
model output to the user. Through the GUI, the user can build or open a project, enter a
database of emission sources, implement user’s choice of parametric distributions, view
or save all calculation results, and manage the message passing between the different
modules.
4.4 Software Development Tools
The development of AUVEE is based on the Windows 95/98 platform. According
to different functional requirements and considering convenience of implementation,
different software development tools were used for different aspects of the software
system. The roles of the different software tools used to develop the AUVEE prototype
software are as follows:
• Visual Fortran 6.0, a product of Digital Equipment Corporation (now Compaq)
was used as the programming language for the algorithms that implement the
probabilistic simulation capabilities.
• Microsoft Access, a product of Microsoft Corporation, was used to develop the
internal and user databases.
51
• Visual C++ 6.0, a product of Microsoft Corporation, was used to develop the
GUI.
• Graphic Sever 5.1, a product of Bits Per Second Ltd., was used to produce charts
for visualization of data, fitted distributions, and bootstrap simulation results.
These charts are contained within the GUI.
More detail regarding the prototype AUVEE software is available in the User's Manual
(Frey and Zheng, 2000).
53
5.0 DEVELOPMENT OF A PROBABILISTIC EMISSIONINVENTORY
In practice, emission inventories are often obtained by multiplying emission
factors and activity factors for specific source categories to obtain an estimate of total
emissions for the source category, and then by adding the total emissions for multiple
source categories. Emission factors are typically assumed to be representative of an
average emission rate from a population of pollutant sources in a specific category (EPA,
1995). However, there may be uncertainty in the population average emissions because
of random sampling error, measurement errors, or possibly because the sample of power
plants from which the emission factor was developed was not a representative sample.
These first two factors typically lead to imprecision in the estimate of the population
average, whereas the third factor may lead to possible biases or systematic errors in the
estimated average.
Lack of knowledge regarding the true average emission factor may lead to
erroneous estimates of total emissions, which has implications for various decision-
making activities. Examples of the latter might include estimating trends in emissions
from year to year, comparing emissions estimates to statewide emissions budgets, or
predicting ambient air quality based upon an estimated emission inventory. Errors in the
inventory can lead to errors in inferences or decisions. In order to avoid errors in
inferences made based upon emission inventories, it is important to understand and
account for the uncertainty in the inventory.
In this chapter, we will present: (1) a general methodology used in this work to
develop a probabilistic emission inventory; (2) the emission inventory model used in the
AUVEE prototype software tool; (3) a summary of probability distribution models of the
variability in emission inventory model inputs based upon the internal database of the
AUVEE prototype software tool; (4) a probabilistic approach for estimating uncertainty
in the emission inventoryl and (5) a method for calculation of the relative importance of
input uncertainties with respect to uncertainty in the total inventory.
54
5.1 General Approach
In this section, we briefly describe a general method used to develop a
probabilistic emission inventory with the help of a conceptual example. In this example
the total emissions from a population of emission sources are to be estimated. Emission
factor and activity factor data sets representative of the population of emission sources
are developed. Initially, probability distributions are developed for the emission factor
data set and the activity factor data set. These probability distributions typically represent
inter-plant variability for a specified averaging time.
In a hypothetical case in which the measurement error and the random sampling
error are negligible for both the emission factor and the activity factor data sets, the
distribution of values for the emissions and activity factors would represent actual inter-
unit variability. In such case, the average emission factor and the average activity factor
could be estimated based upon an arithmetic average of the data. Alternatively, to
develop an emission inventory, the actual emission factor for each individual source
within the population would be multiplied by the actual activity for each individual
source, to obtain an estimate of the emissions for each individual source. The emissions
for each individual source would be summed over the entire population to obtain a point
Activity Factor (Variable)
Emission Factor (Variable)
Point Estimate of Total Emissions
Emission Inventory Model
Figure 5-1. Flow Diagram Illustrating the Propagation of Variability in EmissionInventory Inputs to Obtain a Point Estimate of Total Emissions.
55
estimate of emissions. This case is illustrated in Figure 5-1. The main point here is that,
even though there are probability distributions for variability in emission factors and
activity factors, the final result is a point estimate without uncertainty as long as there is
perfect knowledge regarding variability.
Of course, in practical applications, there is not an exhaustive census of emission
and activity factors for every individual source. Only a small sample of sources within a
population are typically available for development of emission and activity factors.
Measurements may contain measurement errors. The limited size of data sets will reflect
random sampling error, if the sample is in fact random. If the sample is not random, then
there may be biases in the mean value and the range of values of the observed sample. If
the sample is not truly random, then it may be possible to identify the magnitude of
possible biases by analyzing subsets of the available data. For example, a dataset may
display bimodal or multimodal characteristics, indicating that the sample includes two or
more different subpopulations of emission sources. The relative proportion of these
different subsets of emission sources in the available sample may be different then the
relative proportion in the total population. Thus, it may be possible to reweight some of
the data in order to obtain a more representative estimate of emission and activity factors.
The issue of representativeness is address in a case study for an AP-42 emission factor in
a paper by Rhodes and Frey (1997). General considerations regarding representativeness
were covered in an EPA-sponsored workshop on Monte Carlo methods (EPA, 1999).
As a second conceptual example, assume that measurement errors may be
significant, even though the sample size is very large. In this case, there is uncertainty
regarding the true value of each individual data point. Consequently, there is also
uncertainty regarding the true value of the frequency distribution regarding variability
among sources within the population. As a result, there is uncertainty in any estimate of
any statistic of the population, such as the mean emission rate.
As a third conceptual example, consider a situation in which there is no
measurement error but in which the sample size of the random sample of data is
relatively small. In this case, there may be substantial random sampling error
contributing to lack of knowledge regarding any statistics calculated from the data or
regarding the best estimate of the frequency distribution for variability in the population.
56
In this situation, as in the second example, there are alternate possible frequency
distributions for each, any one of which might represent the “true” distribution.
The family of alternative possible frequency distributions, such as would be the
case for the second and third examples given here, for the inventory inputs are shown in
Figure 5-2 as ranges of possible values for the cumulative distribution function of each
model input. The variable and uncertain emission and activity factors are then propagated
through the emission inventory model to simulate the uncertainty in the estimate for the
total emissions from a population of emission sources. In this case, the true value of the
emission and activity factors for each source are unknown. Hence, uncertainty in
emission and activity factors applied to individual sources is reflected by a distribution of
uncertainty for the total emissions.
An emission inventory could also be both variable and uncertain. For example,
the estimate of average hourly emissions as well as the range of uncertainty in how
emissions for input to an air quality model may differ from hour to hour. In this fourth
conceptual example, there is temporal variability in emissions and uncertainty in
emissions for any given point in time. Similarly, there could be spatial variability in the
mean and range of uncertainty of emissions in the grid cells of an air quality model.
Uncertainty in Estimate of
Total Emissions
Activity Factor (Variable & Uncertain)
Emission Factor (Variable & Uncertain)
Total Emissions = Activity Factor X Emission Factor
Figure 5-2. Flow Diagram Illustrating the Propagation of Variability and Uncertainty inEmission Inventory Inputs to Quantify the Uncertainty in the Estimate of Total
Emissions.
57
The general approach employed to quantify variability and uncertainty in emission
inventories and emission factors can be summarized as the following major steps:
1. Compilation and evaluation of a database for emission and activity factors.
2. Visualization of data by developing empirical cumulative distribution
functions for individual activity and emission factors. Scatter plots are also
developed in order to evaluate dependencies between pairs of activity and
emission factors, and to evaluate possible autocorrelations or seasonal
variations over time.
3. Fitting, evaluation, and selection of alternative parametric probability
distribution models for representing variability in activity data and emission
factor data.
4. Characterization of uncertainty in the distributions for variability.
5. Propagation of uncertainty and variability in activity and emissions factors to
estimate uncertainty in facility-specific emissions and/or total emissions from
a population of emission sources.
6. Calculation of importance of uncertainty.
Step 1 through Step 4 have been described separately in Chapters 2 and 3. The
remaining steps are described in the following sections.
5.2 Emission Inventory model
In the development of an emission inventory, an emission factor is often used
because it greatly simplifies the estimation of emissions. As mentioned previously,
emission estimates can be obtained by multiplying an emission factor with an activity
factor that represents the extent of the emissions-generating activity:
E = A × EF (5-1)
where,
E = emissions (e.g., lb of NOx as NO2) A = activity factor (e.g., tons of coal burned), and EF = emission factor (e.g., lb of NOx as NO2 per ton of coal burned).
58
For a power plant unit, the activity data includes the unit heat rate (BTU of fuel input
required to produce one kWh of electricity), unit capacity factor (average capacity
utilization for a given time), and unit capacity (MW). Thus, an annual emission
inventory for a power plant unit is given by:
E = [(EF)/106] (HR) (CF * 8760 hr/yr) (CL) (5-2)
where:
E = emissions (lb/year) EF = emission factor (lb/106 BTU) HR = heat rate (BTU/kWh) CP = Annual capacity factor (actual kWh generated/maximum possible kWh) CL = capacity (MW)
If the units of g/GJ is used for the emission factor, BTU/kWh for heat rate, MW for
capacity, and tons/year for the emission estimate, the emission inventory over a year for a
single unit is calculated by:
CLCPHREFE ••••= 000010182.0 (5-3)
where 0.000010182 is a units conversion coefficient. For a six-month emission
inventory, Equation (5-3) will be changed into :
CLCPHREFE ••••= 000005091.0 (5- 4)
5.3 Development of Probability Distributions for the Emission Inventory ModelInputs
An emission inventory can be probabilistically characterized by the propagation
of probabilistic model inputs through the emission inventory model. For a power plant
unit, model inputs in the emission inventory model include the emission factor and
activity factors. The latter include heat rate, capacity factor and capacity (MW) for each
individual power plant unit. In this project, heat rate and capacity factor were
probabilistically characterized. Capacity was assumed to be a fixed quantity without
uncertainty and variability. However, the approach could be extended to treat these
quantities probabilistically if there were reasons to believe that the reported capacities
were in error. Compared to variability and uncertainty in heat rate and capacity factor, it
is unlikely that uncertainty or variability regarding true plant capacity would play a
59
significant role in most cases, other than due to data recording errors (Frey et al., 1998).
All emission factors were characterized probabilistically.
In this project, probability distribution models were developed for the six-month
average and one-year average activity and emission factor data for all of the five chosen
technology groups. The data for the five technology groups was described in Chapter 3.
The methods for fitting parametric probability distributions to the data were described in
Chapter 2. The probability distribution models are used inputs for the probabilistic
emission inventory. A summary of the distribution judged to provide the best fit to each
emission or activity factor, and the parameters of the distribution, is given in Table 5-1
for the six-month averaging time. Similar information is given in Table 5-2 for the 12-
month averaging time.
5.4 A Probabilistic Approach for Calculating Uncertainty in the EmissionInventory of Coal-Fired Power Plants
Bootstrap simulation introduced in the Chapter 3 is used to quantify uncertainty in
the emission inventory. A probabilistic framework for calculating uncertainty in emission
inventory using bootstrap simulation is shown in the flowchart of Figure 5-3. Based on
the different types of NOx control technology and boiler types, we can classify all units in
the inventory into different technology groups. For each unit, the capacity must be
specified. The number of units within a technology group is specified as the variable N
in Figure 5-3. Therefore, for a given technology group, we generate N random samples
for heat rate, capacity factor, and NOx emission factor from the corresponding parametric
probability distributions for each of these three quantities. Each of the N random samples
represents one unit in the emission inventory for the selected technology group. Thus,
one random sample each of heat rate, capacity factor, and emission factor are used, as in
Equation 5-3 or Equation 5-4, depending upon the averaging time, to calculate the total
emissions for a single unit. The calculation is repeated for each of the N units in the
technology group to arrive at total emissions for each individual unit.
60
Table 5-1. Summary of Selected Best Fit Parametric Distribution and Parameters forEmission and Activity Factors for Five Coal-Fired Power Plant TechnologyGroups Based Upon Six-Month Average Data.
Parameter ValuesTechnologyGroup Input Variables
FittedDistribution 1st parametera 2nd parameterb
Heat Rate Lognormal 9.31 0.122Capacity Factor Beta 3.92 2.71DB/UEmission Factor Weibull 323.5 3.84
Heat Rate Lognormal 9.26 0.074Capacity Factor Beta 7.02 3.18DB/LNBEmission Factor Gamma 17.25 10.22
Heat Rate Lognormal 9.28 0.12Capacity Factor Beta 6.08 3.79T/UEmission Factor Lognormal 5.24 0.27
Heat Rate Normal 10,590 848Capacity Factor Beta 6.53 2.94T/LNC1Emission Factor Gamma 19.03 8.58
Heat Rate Lognormal 9.25 0.085Capacity Factor Beta 0.711 0.087DTF/OFAEmission Factor Gamma 99.49 1.91
a 1st parameter in the Table 5-1 is mean for Normal distribution, it is the geometric mean for LogNormal, scaleparameter for Gamma and Beta, and shape parameter for Weibull.
b 2nd parameter is the standard deviation for Normal distribution, geometric standard deviation for Lognomal, shapeparameter for Weibull, Gamma and Beta.
Table 5-2. Summary of Selected Best Fit Parametric Distribution and Parameters forEmission and Activity Factors for Five Coal-Fired Power Plant TechnologyGroups Based Upon Twelve-Month Average Data.
Parameter ValuesTechnologyGroup Input Variables
FittedDistribution 1st parametera 2nd parameterb
Heat Rate Lognormal 9.31 0.12Capacity Factor Beta 3.30 2.89DB/UEmission Factor Weibull 323.33 4.22
Heat Rate Lognormal 9.27 0.08Capacity Factor Beta 6.94 3.36DB/LNBEmission Factor Gamma 18.66 9.48
Heat Rate Lognormal 9.28 0.11Capacity Factor Beta 3.62 2.84T/UEmission Factor Gamma 13.46 14.77
Heat Rate Lognormal 9.28 0.07Capacity Factor Beta 3.11 1.70T/LNC1Emission Factor Gamma 18.51 8.7
Heat Rate Lognormal 9.24 0.09Capacity Factor Normal 0.66 0.07DTF/OFAEmission Factor Lognormal 5.25 0.08
a 1st parameter in the table is the mean for Normal distribution, the geometric mean for LogNormal, scale parameter forGamma and Beta, and shape parameter for Weibull.
b 2nd parameter is the standard deviation for Normal distribution, geometric standard deviation for Lognomal, shapeparameter for Weibull, Gamma and Beta.
61
NO
YES
Take one sample from each model input and enter into the emission inventory model for single unit
Run the model, and obtain an emission inventory output for one unit
Sum up the emission inventory of all units, and obtain an emission inventory output for the chosen technology group
Generate N (the number of units within the chosen technology group) heat rate, capacity factor and NOx emission random samples from the corresponding distribution describing heat rate,capacity factor and NOx emission, respectively
Have all units (N) in the technology group been run through the model ?
Does Bootsrap replication number equals B?
NO
YES
Have all the technology group been analyzed ?
Select a technology group
NO
YES
Read unit capacity data within the chosen technology group
For i=1 to B
Obtain an uncertainty distribution in the emsssion inventory for the chosen technology group
Obtain an uncertainty distribution in total emsssion inventory for all chosen technology groups
Figure 5-3. Flowchart for Calculating Uncertainty in Emission Inventory UsingBootstrap simulation
62
The sum of the emissions for all of the N units is the total emission inventory for
the technology group. The process of randomly simulating heat rate, capacity factor, and
emission factor values for all of the N units is repeated to arrive at another estimate of
total emissions for the technology group. The second estimate of total emissions will
differ from the first because of random sampling fluctuations in the inputs. This process
is repeated B times, to arrive at B estimates of the total emission inventory of the
technology group. The B estimates of total emissions for a technology group characterize
a distribution for uncertainty in the total emissions. This process was conducted for each
technology group.
The overall uncertainty in the emission inventory is calculated as indicated in the
following equations:
)(ETE
)(CLCPHREFcE
m
ii
j,ij,ij,ij,i
n
ji
65
55
1
1
−=
−⋅⋅⋅⋅=
∑
∑
=
=
where:
Ei: Emissions at ith technology group c: Conversion coefficient ( See page ?) EFi,j: Random emission factor at the ith technology group and jth unit HRi,j: Random heat rate at the ith technology group and jth unit CPi,j: Random capacity factor at the ith technology group and jth unit CLi,j: Capacity load at the ith technology group and jth unit N: Number of units in a technology group m: Number of technology group TE: Total emissions from all technology groups
63
5.5 Identifying Key Sources of Uncertainty
The calculation of the importance of uncertainty from different model inputs is
useful because it can indicate which model input makes the most contribution to
uncertainty in a selected model output. Such information helps where to target
additional research or data collection to reduce uncertainty in a model input, thereby
leading to a reduction in uncertainty in the model output. In the case study developed in
this project, a method is employed for identifying which of the four technology groups
contribute most to uncertainty in the total emission inventory. The overall emission
inventory can be characterized by using the following equation:
)(EMEMn
iitotal 75
1
−= ∑=
where:
EMtotal: Total emission inventory (tons/year) EMi : the ith technology group n: the number of technology group
There are a variety of measures for evaluating the relative importance of
uncertainties in model inputs (e.g., see Morgan and Henrion, 1990; Cullen and Frey,
1999). The approach employed here is to calculate the sample correlation coefficient
between the distribution of uncertainty in a technology group emission inventory and the
total emission inventory. The sample correlation coefficient is a measure of the linear
dependence of the model output with respect to the selected model input. The sample
correlation between a model input, x, and a model output, y, is calculated as follows:
)()yy()xx(
)yy)(xx(U
m
k
m
k kk
m
k kk85
1 1
22
1 −−×−
−−=
∑ ∑∑
= =
=ρ
Where:
pU Importance of uncertainty from model input y samples
kx : Model output samples, in this case, kx can be considered as the total emission
inventory
x : The mean of kx samples
64
ky : Model input samples
y : The mean of ky samples.
A large magnitude of the uncertainty importance measure, Up, indicates a stronger linear
dependence between the selected model input and model output.
In the next chapter, the methods described in this chapter are applied to a case
study for power plant NOx emissions.
65
6.0 EXAMPLE CASE STUDY
The approach for developing a probabilistic emission inventory using the AUVEE
prototype software is illustrated here using a case study. The case study is based on the
state of North Carolina. This case study was selected because the number of units
representing each of four power plant technologies is dissimilar. The objective of the
case study is to estimate uncertainty in the emissions inventory in the near feature. There
are different amounts of uncertainty, based on random sampling error, associated with the
emissions estimates for each of the technologies. Specifically, the following numbers of
units are included in the case study:
- 19 tangential-fired boilers with no NOx controls (T/U)
- 11 tangential-fired boilers using Low NOx Burners and overfire air option1(T/LNC1)
- 12 dry bottom wall-fired boilers with no NOx controls (DB/U)
- 3 dry bottom wall-fired boilers using low NOx burners (DB/LNB)
No units of the technology group with dry bottom turbo-fired boilers and overfire air are
present in the state of North Caolina. Therefore, data for this technology group were not
used in the example case study. The disparate number of units, representing each of the
technologies mentioned above, presents a unique opportunity for understanding the role
of averaging over different numbers of units with respect to uncertainty in emissions for
technology groups and statewide emissions from all technologies.
The uncertainty in the emission inventory can be characterized by the propagation
of probabilistic model inputs through the emission inventory model. For a power plant,
model inputs in the emission inventory include activity factors and emission factors.
Activity factors include heat rate (BTU/kWh), capacity factor, and capacity (MW) for
individual units. In this project, heat rate and capacity factor were probabilistically
characterized. Capacity was assumed to be fixed without uncertainty and variability.
However, the approach could be extended to treat capacities probabilistically if there
were reasons to believe that the reported capacities were in error. Compared to
variability and uncertainty in heat rate and capacity factor, it is unlikely that uncertainty
or variability regarding true plant capacity would play a significant role in most cases,
66
other than due to data recording errors. All emission factors were characterized
probabilistically.
6.1 Fitting Distributions to Data to Represent Inter-Unit Variability
The case study is based upon a 6-month period, inclusive of summer months.
From the internal database of the prototype AUVEE software, 6-month average data
obtained from the EPA CEMS database were analyzed via the AUVEE user interface.
Parametric probability distributions were fit to each activity and emission factor required
for the inventory. The parameters of the distributions were estimated by AUVEE using
Maximum Likelihood Estimation (MLE). A summary of the emission and activity factor
databases for both six-month and 12-month emission inventories was provided in Tables
3-2 and 3-3, respectively. A summary of the selected parametric distributions and the
estimated values of the parameters was given in Tables 5-1 and 5-2 for the six-month and
12-month databases, respectively.
Examples of the fitted distributions for the example of one technology group are
shown in Figures 6-1, 6-2, and 6-3 for an emission factor, a capacity factor, and a heat
rate, respectively. The inter-unit variability for the selected technology group, tangential-
fired boilers with combustion-based NOx control, is substantial. For example, the
variation in the emission factor for most of the units is from approximately 350 g/GJ to
650 g/GJ. The overall range of variability is from approximately 270 g/GJ to 770 g/GJ.
The capacity factor varies from approximately 0.3 to 0.9, and approximately 70 percent
of the units have capacity factors between approximately 0.6 and 0.9. The heat rate
varies from approximately 9,000 BTU/kWh to 12,000 BTU/kWh.
The fitted distributions are a compact means for representing inter-unit variability.
The goodness-of-fit can be evaluated qualitatively by comparing the fitted distribution
with the data. For example, the Lognormal distribution fitted to the emission factor data
agrees with the tails of the distribution of the data and with the central tendency of the
data. There are some deviations of the fitted distribution from the data in the regions of
the 25th and 75th percentiles, indicating that the fit is not particularly good. In contrast,
the Beta distribution fitted to the capacity factor data agrees very well with the data, as
does the Lognormal distribution fitted to the heat rate data.
67
Fitted Lognormal Distribution
Data Set
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
0 200 400 600 800 1000
NOx Emission Factor (gram/ GJ fuel input)
Figure 6-1. Comparison of Fitted Lognormal Distribution and Six-Month Average NOx
Emission Factor Data for Tangential-Fired Boilers with NOx Control.
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
0.0 0.2 0.4 0.6 0.8 1.0
Capacity Factor
Data Set (n=41)
Fitted Beta Distribution
Figure 6-2. Comparison of Fitted Beta Distribution and Six-Month Average CapacityFactor Data for Tangential-Fired Boilers with NOx Control.
Data Set (n=41)
Fitted Lognormal Distribution
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
7000 8000 9000 10000 11000 13000 1400012000
Heat Rate (BTU/kWh)
Figure 6-3. Comparison of Fitted Lognormal Distribution and Six-Month Average HeatRate Data for Tangential-Fired Boilers with NOx Control.
68
6.2 Quantifying Uncertainty in Statistics of the Fitted Distributions
Bootstrap simulation is used to quantify uncertainty in the inputs to the emission
inventory. As noted in Chapter 2, bootstrap simulation was introduced by Efron as a
means for calculating confidence intervals for statistics in a general manner for situations
in which analytical solutions are not available (Efron and Tibshirani, 1993). A
probabilistic framework for calculating uncertainty in emissions estimation using
Bootstrap simulation is described in Chapter 2.
Bootstrap simulation is a numerical method for simulating random sampling
error. In bootstrap simulation, a probability distribution is assumed to be a best estimate
of the true but unknown population distribution for a quantity. For example, the
parametric distributions fit to datasets for inter-unit variability in emissions and activity
data are assumed to be the best estimates of the true but unknown population distribution
for inter-unit variability of these quantities.
Using a random sampling technique, synthetic data sets of the same sample size
as the original observed data set are simulated from the assumed population distribution.
The random sampling technique employed is Monte Carlo simulation. A synthetic
random sample of the same sample size as the original data is referred to as a bootstrap
sample. For each bootstrap sample, a new value of the statistic(s) of interest, such as the
mean, standard deviation, distribution parameters, or fractiles of the distribution, are
calculated. An estimate of a statistic calculated from a bootstrap sample is referred to as
a bootstrap replicate of the statistic.
To obtain an estimate of uncertainty for the selected statistic(s), bootstrap samples
are drawn repeatedly from the assumed population distribution. For example, if the
original data set contained 36 data points, perhaps 500 random samples of 36 data points
would be simulated, and for each of these 500 bootstrap samples, a bootstrap replicate of
the mean is calculated. The 500 bootstrap means describe a sampling distribution for the
mean. A sampling distribution is a probability distribution for a statistic. From the
sampling distribution, probability ranges can be inferred. For example, the 95 percent
probability range for the mean can be estimated.
A summary of uncertainty in the mean emission and activity factors is shown in
Table 6-1 and Table 6-2 for the six-month and 12-month emission inventories,
69
respectively. In both Table 6-1 and Table 6-2, all five technology groups included in the
internal database are indicated. For each technology group, the number of data points
available for each of the three emission and activity factors are indicated. The mean
value of the available data, and the 95 percent confidence interval for the mean, are
shown.
For example, for Dry Bottom Wall-Fired Boilers with No NOx Controls, there
were 87 data points available for the 6-month emission inventory database. The mean
heat rate is 11,190 BTU/kWh. The 95 percent confidence interval for the mean heat rate
is from 10,880 BTU/kWh to 11,470 BTU/kWh. This is a range of minus 310 BTU/kWh
to plus 280 BTU/kWh, or minus 2.8 percent to plus 2.5 percent with respect to the mean.
The range of uncertainty in the mean capacity factor is from minus 5 percent to plus 7
percent with respect to the mean. The range of uncertainty in the mean emission factor is
from minus 5 percent to plus 7 percent. If the confidence interval had been obtained
using the conventional widely applied analytical approach, the ranges would have been
reported as symmetric. For example, for the emission factor, the standard deviation of
the 87 data points is 90 g/GJ. The standard error of the mean would be estimated as the
standard deviation divided by the square root of the sample size, resulting in an estimated
standard error of 9.6 g/GJ. A 95 percent confidence interval would be enclosed by a
range of plus or minus 1.96 multiples of the standard error of the mean, or a range of plus
or minus 6.5 percent. The asymmetry in the confidence intervals is because of skewness
in the data set from which the mean and the confidence intervals were inferred. The
confidence intervals were obtained using the bootstrap simulation technique described in
Chapter 2. The conventional analytical approach imposes an assumption that the data are
not skewed and, therefore, cannot properly account for skewness in the data.
The range of uncertainty in the mean values is a function of both the variability in
the data set and the number of data points. Thus, datasets with larger numbers of data
points tend to have less uncertainty. For example, for the Tangential Fired-Boilers with
No NOx Control, for which there are 136 data points in the six-month database, the range
of uncertainty in the mean emission factor is minus 4.1 percent to plus 5.6 percent,
compared to a range of minus 8.9 percent to plus 8.9 percent for the Dry Bottom Turbo-
Fired Boilers with Overfire Air, for which there were only six data points available.
70
Table 6-1. Summary of Uncertainty in 6-month Emission Inventory Mean Emission andActivity Factors Based Upon National Data
Technology VariablesaNumber ofData Points
Mean95 PercentConfidence
Intervalb
Heat Rate 87 11,190 10,880, 11,470Capacity Factor 87 0.59 0.56, 0.63
Dry Bottom Wall-FiredBoilers with No NOx
Controls NOx Emission Rate 87 291 277, 312Heat Rate 98 10,570 10,440, 10710Capacity Factor 98 0.69 0.66, 0.72
Dry Bottom Wall-firedBoilers with Low NOx
Burners NOx Emission Rate 98 176 168, 196Heat Rate 136 10,860 10,310, 11,240Capacity Factor 136 0.62 0.59, 0.64
Tangential FiredBoilers with No NOx
Controls NOx Emission Rate 136 196 188, 207Heat Rate 41 10,590 10,370, 10,860Capacity Factor 41 0.69 0.65, 0.73
Tangential Fired BoilersUsing Low NOx Burners& Overfire Air Option 1 NOx Emission Rate 41 163 153, 176
Heat Rate 6 10,420 9,830, 11,200Capacity Factor 6 0.71 0.64, 0.77
Dry Bottom Turbo-FiredBoilers with Overfire Air
NOx Emission Rate 6 191 174, 208aUnits: Heat rate (BTU/kWh); Capacity Factor (actual kWh/maximum possible kWh); and NOx EmissionRate (g NOx as NO2/GJ of fuel input).b95 Percent Confidence Interval for Mean Value
Table 6-2. Summary of Uncertainty in 12-month Emission Inventory Mean Emissionand Activity Factors Based Upon National Data
Technology VariablesaNumber ofData Points
Mean95 PercentConfidence
Intervalb
Heat Rate 84 11,150 10,870, 11,410Capacity Factor 84 0.53 0.50, 0.57
Dry Bottom Wall-firedBoilers with No NOx
Controls NOx Emission Rate 84 293 278, 310Heat Rate 98 10,610 10,410, 10,820Capacity Factor 98 0.67 0.65, 0.70
Dry Bottom Wall-firedBoilers with Low NOx
Burners NOx Emission Rate 98 177 169, 185Heat Rate 134 10,780 10,560, 11,020Capacity Factor 134 0.56 0.53, 0.59
Tangential FiredBoilers with No NOx
Controls NOx Emission Rate 134 198 191, 208Heat Rate 36 10,730 10,490, 10,990Capacity Factor 36 0.65 0.58, 0.71
Tangential Fired BoilersUsing Low NOx Burners& Overfire Air Option 1 NOx Emission Rate 36 161 148, 174
Heat Rate 6 10,360 9,610, 11,030Capacity Factor 6 0.66 0.62, 0.71
Dry Bottom Turbo-FiredBoilers with Overfire Air
NOx Emission Rate 6 191 178, 203aUnits: Heat rate (BTU/kWh); Capacity Factor (actual kWh/maximum possible kWh); and NOx EmissionRate (g NOx as NO2/GJ of fuel input).b95 Percent Confidence Interval for Mean Value
71
The range of uncertainty in the emission and activity factors of the 12-month
database is similar to that of the six-month database. For example, the confidence
interval for the mean emission factor for Dry Bottom Wall-Fired Boilers with No NOx
Controls has a range of minus 5.1 percent to plus 5.8 percent with respect to the mean.
This is similar to the range of uncertainty for the six-month database. While there are
some specific quantitative differences in the ranges of uncertainty in the mean when
comparing the six-month and the 12-month databases, the differences are generally not
substantial.
6.3 Evaluating Goodness-of-Fit Using Bootstrap Results
Bootstrap simulation can be used to help evaluate the goodness of a fit of a
distribution with respect to the original data. Confidence intervals for the fitted
distribution can be estimated and compared with the original data.
For example, Figures 6-4, 6-5, and 6-6 show a comparison of confidence intervals
for the fitted distribution with the datasets for the emission factor, capacity factor, and
heat rate, respectively, for one technology group. The width of the confidence intervals
can be compared to the range of variability in the data to gain insight regarding the
relative degree of uncertainty. For example, the width of the 95 percent probability band
in Figure 6-4 spans approximately 50 g/GJ to 100 g/GJ for most percentiles of the fitted
distribution. Compared to a range of variability in the data of approximately 500 g/GJ
when comparing the difference in the emission rate between the smallest and largest
emission factors in the data set, it appears that the uncertainty is relatively small
compared to the range of inter-unit variability in emissions. For this particular data set,
there are 41 data points, which is a relatively large sample size. For datasets with smaller
sample size, the range of uncertainty is typically larger. The range of uncertainty is
influenced both by the variability in the dataset and by the sample size.
In Figure 6-4, it appears that most of the data are contained within the 95 percent
confidence interval; however, few of the data are contained within the 50 percent
confidence interval. Thus, it appears that the Lognormal distribution may adequately
describe the inter-unit variability in emissions for some data quality criteria, but perhaps
not for others. Later, we will return to consider whether this particular input was
important to the overall estimate of uncertainty in the inventory.
72
95 percent90 percent
Data Set
Confidence Interval
50 percent
Fitted Lognormal Distribution
0 200 400 600 800 1000
NOx Emission Factor (gram/GJ fuel input)
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
Figure 6-4. Probability Bands Representing Uncertainty in the Parametric DistributionFitted to NOx Emission Factor Data for T/LNC1 (n=41)
95 percent90 percent
Data Set
Confidence Interval
50 percent
Fitted Beta Distribution
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
0.0 0.2 0.4 0.6 0.8 1.0
Capacity Factor
Figure 6-5. Probability Bands Representing Uncertainty in the Parametric DistributionFitted to Capacity Factor Data for T/LNC1 (n=41)
95 percent90 percent
Data Set
Confidence Interval
50 percent
Fitted Lognormal Distribution
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
7000 8000 9000 10000 11000 12000 13000 14000
Heat Rate (BTU/kWh)
Figure 6-6. Probability Bands Representing Uncertainty in the Parametric DistributionFitted to Heat Rate Data for T/LNC1 (n=41)
73
For the other two cases, the fitted distributions agree very well with the data. For
example, more than half of the data are enclosed by the 50 percent confidence intervals,
and all but one or two data points out of 41 are contained within the 95 percent
confidence intervals. Thus, the fits in these two cases are reasonably good ones. From
these comparisons, which the user may view via the AUVEE GUI, one may conclude that
the fitted distributions adequately characterize inter-unit variability.
A summary of the comparison of the probability bands of the fitted distributions
with the data for the emission and activity factors for the six-month and 12-month
emission inventories are given in Table 6-3 and Table 6-4, respectively.
For each variable shown in Table 6-3, it is desired that, on average, 50 percent of
the data should be enclosed by the 50 percent probability range for the fitted parametric
distribution. In addition, it is desired that, on average, 95 percent of the data are enclosed
by the 95 percent probability range of the fitted parametric distribution. In most cases,
the data appear to be consistent with the fitted distribution. For example, in the case of
capacity factor for the uncontrolled dry bottom boiler (DB/U) group, 54 percent of the
data are enclosed by the 50 percent probability range, and all of the data are enclosed by
the 95 percent probability range. In fact, for seven of the 15 variables represented in
Table 6-3, more than half of the data are enclosed by the 50 percent probability range and
more than 95 percent of the data are enclosed by the 95 percent probability range of the
fitted cumulative distribution function. In nine of the 15 variables, all of the data are
enclosed by the 95 percent probability range of the fitted CDF, and in 11 of the 15
variables, at least 95 percent of the data are enclosed by the 95 percent probability range.
Thus, in most cases, it appears that the fitted distributions agree with the data to a
reasonable extent. One of the few cases of relatively poor agreement was illustrated in
Figure 6-4.
For the 12-month database, 95 percent or more of the data are enclosed by the 95
percent probability range of the fitted distribution in 9 of 15 cases, and 90 percent or
more of the data are enclosed by the 95 percent probability range in 12 of the 15 cases.
Thus, in most cases, there is reasonable agreement between the data and the fitted
distributions.
74
Table 6-3. Summary of the Goodness-of-Fit of Parametric Distributions Fitted toEmission and Activity Factor Data for a Six-Month EmissionInventory Based Upon Evaluation of the Proportion of Data Enclosed by the50 Percent and 95 Percent Probability Bands of the Fitted CumulativeDistribution Function.
Fraction of Data Enclosed by:TechnologyGroup Input Variables
FittedDistribution
50 PercentProbability Range
95 percentProbability Range
Heat Rate Lognormal 0.26 0.97Capacity Factor Beta 0.54 1.0DB/UEmission Factor Welbull 0.18 0.77Heat Rate Lognormal 0.58 1.0Capacity Factor Beta 0.61 1.0DB/LNBEmission Factor Gamma 0.57 1.0Heat Rate Lognormal 0.19 0.89Capacity Factor Beta 0.66 1.0T/UEmission Factor Lognormal 0.44 0.99Heat Rate Normal 0.41 1.0Capacity Factor Beta 0.75 1.0T/LNC1Emission Factor Gamma 0.17 0.77Heat Rate Lognormal 0.33 1.0Capacity Factor Beta 0.67 1.0DTF/OFAEmission Factor Gamma 0.17 0.83
Table 6-4. Summary of the Goodness-of-Fit of Parametric Distributions Fitted toEmission and Activity Factor Data for a 12-Month EmissionInventory Based Upon Evaluation of the Proportion of Data Enclosed by the50 Percent and 95 Percent Probability Bands of the Fitted CumulativeDistribution Function.
Fraction of Data Enclosed by:TechnologyGroup Input Variables
FittedDistribution
50 PercentProbability Range
95 percentProbability Range
Heat Rate Lognormal 0.24 0.94Capacity Factor Beta 0.88 1.0DB/UEmission Factor Welbull 0.24 0.79Heat Rate Lognormal 0.29 0.91Capacity Factor Beta 0.37 0.98DB/LNBEmission Factor Gamma 0.47 0.99Heat Rate Lognormal 0.24 0.82Capacity Factor Beta 0.77 1.0T/UEmission Factor Gamma 0.44 0.92Heat Rate Lognormal 0.42 1.0Capacity Factor Beta 0.28 0.98T/LNC1Emission Factor Gamma 0.17 0.78Heat Rate Lognormal 0.33 1.0Capacity Factor Normal 0.33 1.0DTF/OFAEmission Factor Lognormal 0.5 1.0
75
6.4 Quantifying Uncertainty in the Inputs to an Emission Inventory
After the user has entered data regarding the number of units of each technology
group that are included in the inventory, a simulation of uncertainty specific to the
particular inventory may be performed. For example, in the example inventory, there are
only 11 units of the specific technology group represented in Figures 6-4, 6-5, and 6-6.
Thus, although there are a total of 41 such units represented in the database for the six-
month emission inventory, the uncertainty estimate specific to the example inventory
must account for the fact that there are only 11 units in the inventory. An assumption is
that the 11 units are a random sample of the population of all units of the same
technology group. The uncertainty in the mean emission rate among 11 units should be
based upon a sample size of 11 and not a sample size of 41. In other words, if the 11
units are a random sample from the population, then the sampling distribution for the
mean of the 11 units must reflect stochastic variation in the mean for a random sample of
only 11. Therefore, bootstrap simulation with bootstrap samples of 11 synthetic data
points is used to quantify uncertainty in the distribution used to describe inter-unit
variability in emissions for a sample of 11 units.
Example of results for uncertainty based upon the number of units actually in the
inventory are shown in Figures 6-7, 6-8, and 6-9 for the emission factor, capacity factor,
and emission factor, respectively, of one of the four technology groups. In comparing
Figure 6-7 with Figure 6-4, it is apparent that the confidence intervals are much wider in
Figure 6-7. The increased width of the confidence intervals in Figure 6-7 corresponds to
the smaller sample size of 11 versus 41, the latter of which is the basis for the bootstrap
simulation results shown in Figure 6-4. With a random sample of only 11, there is more
random fluctuation in the mean, median, standard deviation, parameter values, fractiles,
and other statistics that may be calculated from the bootstrap samples. With a smaller
number of units, the range of uncertainty is larger. Similar results are obtained for the
activity factors when comparing Figures 6-8 versus Figure 6-5 for capacity factor, and
when comparing Figure 6-9 versus Figure 6-6 for heat rate.
76
0 200 400 600 800 1000 1200
NOx Emission Factor (gram/GJ fuel input)
0.0
0.2
0.4
0.6
0.8
1.0C
umul
ativ
e P
roba
bili
ty
95 percent90 percent
Confidence Interval
50 percent
Fitted Lognormal Distribution
Figure 6-7. Probability Bands Based Upon Number of Units in the Emission Inventory(n=11) for the Example of the Emission Factor of the T/LNC1 Technology Group.
95 percent90 percent
Confidence Interval
50 percent
Fitted Beta Distribution
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
0.0 0.2 0.4 0.6 0.8 1.0
Capacity Factor
Figure 6-8. Probability Bands Based Upon Number of Units in the Emission Inventory(n=11) for the Example of Capacity Factor of the T/LNC1 Technology Group.
95 percent90 percent
Confidence Interval
50 percent
Fitted Lognormal Distribution
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
7000 8000 9000 10000 11000 12000 13000 14000
Heat Rate (BTU/kWh)
Figure 6-9. Probability Bands Based Upon Number of Units in the Emission Inventory(n=11) for the Example of Heat Rate of the T/LNC1 Technology Group.
77
A summary of the uncertainty in the mean emission and activity factors for the
example case study is given in Table 6-5 for the six-month emission inventory inputs and
in Table 6-6 for the 12-month emission inventory inputs. These two tables can be
compared with Tables 6-1 and 6-2, respectively. It is apparent the the 95 percent
probability ranges for the uncertainty estimates of the mean are larger with a sample size
of 11 than with a sample size based upon the total amount of data available nationally.
For example, for the T/LNC1 technology group, the 95 percent confidence
interval for the mean emission factor based upon the 41 units in the national database was
minus 6.1 percent to plus 8.0 percent with respect to the mean value. For a random
sample of 11 units, the 95 percent probability range for the mean is from minus 14.8
percent to plus 13.5 percent with respect to the mean.
The 95 percent confidence interval for the mean is not reported for the dry bottom
boiler units with NOx controls because only three units of this type are included in the
database. At this time, the prototype AUVEE software will not report confidence
intervals or probability bands if the number of units is less than or equal to three.
However, in developing the probabilistic emission inventory, the emission and activity
factors for individual units are sampled at random from the assumed population
distribution using the method described in Chapter 5.
78
Table 6-5. Summary of Uncertainty in 6-month Emission Inventory Mean Emission andActivity Factors Based Upon the Number of Units in the Example CaseStudy
TechnologyGroup
Variablesa Number ofUnits
Mean95 Percent Confidence
Interval on MeanHeat Rate 12 11,190 10,480, 11,890Capacity Factor 12 0.59 0.49, 0.68DB/UNOx Emission Rate 12 291 248, 344Heat Rate 3 10,570 NACapacity Factor 3 0.69 NADB/LNBNOx Emission Rate 3 176 NAHeat Rate 19 10,860 9,540, 12,060Capacity Factor 19 0.62 0.54, 0.69T/UNOx Emission Rate 19 196 173, 225Heat Rate 11 10,590 10,110, 11070Capacity Factor 11 0.69 0.60, 0.78T/LNC1NOx Emission Rate 11 163 142, 185
aUnits: Heat rate (BTU/kWh); Capacity Factor (actual kWh/maximum possible kWh); and NOx EmissionRate (g NOx as NO2/GJ of fuel input).
Table 6-6. Summary of Uncertainty in 12-month Emission Inventory Mean Emissionand Activity Factors Based Upon the Number of Units in the Example CaseStudy
TechnologyGroup
Variablesa Number ofUnits
Mean95 Percent Confidence
Interval on MeanHeat Rate 12 11,150 10,460, 11,930Capacity Factor 12 0.53 0.42, 0.63DB/UNOx Emission Rate 12 293 251, 333Heat Rate 3 10,610 NACapacity Factor 3 0.67 NADB/LNBNOx Emission Rate 3 177 NAHeat Rate 19 10,780 10,240, 10,430Capacity Factor 19 0.56 0.48, 0.65T/UNOx Emission Rate 19 198 177, 222Heat Rate 11 10,730 10,280, 11,220Capacity Factor 11 0.65 0.52, 0.78T/LNC1NOx Emission Rate 11 161 137, 187
aUnits: Heat rate (BTU/kWh); Capacity Factor (actual kWh/maximum possible kWh); and NOx EmissionRate (g NOx as NO2/GJ of fuel input).
79
6.5 Propagating Uncertainty in Emission Inventory Inputs to PredictUncertainty in Emission Inventory Outputs
To estimate uncertainty in the total emissions for the inventory, emissions for
individual units of each technology group are simulated, as described in Chapter 5. For
example, if there are 11 units in a technology group, then 11 random samples are
simulated from the fitted distributions for emission factor, capacity, and heat rate. Each
of these 11 values are paired with one of the 11 user-entered values for unit capacities.
Eleven values of emissions for each of the 11 units are calculated and summed to
represent one possible realization of total emissions for the technology group. This
process is repeated many times for the technology group to develop hundreds or
thousands of estimates of total emissions within the group. The distribution of values of
the total emissions for the group represents uncertainty in the total emissions. This
process is repeated for all technology groups in the inventory. The uncertainty in the
inventory, inclusive of all technology groups, is calculated by summing the emissions
from each technology group for each of the hundreds or thousands of realizations of
uncertainty, to create hundreds or thousands of alternative random estimates of the
emission inventory.
6.5.1 Uncertainty Results for the Example Six Month Emission Inventory
Figure 6-10 illustrates an uncertainty estimate for the total six month emissions
from one technology group. In this case, the emissions are from 11 units. The mean
value of the emissions is 25,200 tons of NOx emitted over a six month period. The 2.5th
percentile of the distribution of uncertainty in emissions is 19,800 tons of NOx emitted
over a six month period. The 97.5th percentile is 31,100 tons of NOx emitted over a six
month period. The 2.5th and 97.5th percentiles enclose the 95 percent probability range.
Expressed on a relative basis, the 95 percent probability range for uncertainty is minus 21
percent to plus 23 percent with respect to the mean value.
The range of uncertainty in the emissions for the example technology group is
slightly asymmetric, reflecting the fact that many of the inputs to the emission inventory
have skewed distributions (e.g., as in the case of the Lognormal distribution fit to the
emission factor data) and small sample sizes (e.g., n=11). The range of uncertainty
reflects the large amount of inter-unit variability in the inputs to the inventory. For
80
example, as mentioned in regard to Figures 6-1, 6-2, and 6-3, there is substantial inter-
unit variability in the emission factor, capacity factor, and heat rate. The wide range of
variation in performance and operation of these types of units is reflected in the
comparatively wide range of uncertainty for the total emissions of this technology group.
The overall uncertainty in the six month emission inventory, inclusive of all four
technology groups considered, is shown in Figure 6-11. The estimated mean emission
rate is 84,800 tons of NOx emitted in a six month period. The 95 percent probability
range is enclosed by emissions of 71,800 tons and 99,900 tons. This is a range of -13,000
tons to +15,100 tons, or -15 percent to +18 percent, with respect to the mean. The
asymmetry of the 95 percent probability range is a result of skewness in many of the
input assumptions among the four technology groups.
A summary of the uncertainty results for the entire six-month emission inventory
is given in Table 6-7. Although the absolute range of uncertainty for the total inventory
is greater than the absolute range of uncertainty for the selected technology group, the
relative range of uncertainty is smaller. While this result may seem counter-intuitive, the
result occurs because the uncertainty in emissions for each technology group is assumed
to be statistically independent of the other technology groups. There is no compelling
reason to assume, for example, that if emissions are high at a particular tangential-fired
boiler, that they must also be high at a dry-bottom boiler also located in the region of the
inventory.
A property of probabilistic simulations is that, in general, it is not possible to sum
the values of selected percentiles of each model input to obtain an estimate of the same
percentile of the model output. For example, the 2.5th percentile of the total emission
inventory, which is 71,800 tons, does not correspond to a sum of the 2.5th percentile of
each of the four technology groups. However, for linear models, the sum of the means is
usually the same as the mean of the sum, unless there is a correlation among the model
inputs.
81
0.0
0.2
0.4
0.6
0.8
1.0C
umul
ativ
e P
roba
bili
ty
15000 20000 25000 30000 35000 40000
NOx Emission Inventory for T/LNC1 (tons/6month)
95 Percent ProbabilityRange
Figure 6-10. Uncertainty in a Six-Month NOx Emission Inventory for an IndividualTechnology Group (T/LNC1) Comprised of 11 Units.
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
60000 70000 80000 90000 100000 110000
Uncertainty in the Total Emission Inventory (tons/6month)
95 Percent ProbabilityRange
Figure 6-11. Uncertainty in a Six-Month NOx Emission Inventory Inclusive of FourTechnology Groups.
Table 6-7. Summary of Uncertainty Results for the Six Month Emission Inventory CaseStudy
Random Error (%)aTechnologyGroup
2.5th
PercentMean
97.5th
Percentile Negative PositiveDB/U 21,700 31,100 40,100 30 29
DB/LNB 5,600 8,100 11,400 31 39T/U 15,300 20,400 28,600 25 40
T/LNC1 19,800 25,200 31,100 21 23Total 71,800 84,800 99,900 15 18
aResults shown are the relative uncertainty ranges for a 95 percent probability range, given with respect tothe mean value.
82
6.5.2 Uncertainty Results for the Example Twelve Month Emission
Inventory
Figure 6-12 illustrates an uncertainty estimate for the total twelve month
emissions from one technology group. In this case, the emissions are from 11 units. The
mean value of the emissions is 47,200 tons of NOx emitted over a six month period. The
2.5th percentile of the distribution of uncertainty in emissions is 33,400 tons of NOx
emitted over a six month period. The 97.5th percentile is 62,300 tons of NOx emitted over
a six month period. The 2.5th and 97.5th percentiles enclose the 95 percent probability
range. Expressed on a relative basis, the 95 percent probability range for uncertainty is
minus 29 percent to plus 32 percent with respect to the mean value. The relative range of
uncertainty for the12-month inventory is somewhat larger, in this case, than was the
relative range of uncertainty for the six-month inventory for the same technology group.
This may be attributable, in part, to the fact that electrical load tends to be higher during
the summer months represented by the twelve month inventory. Therefore, there may be
less variability in plant activity during the summer months when compared with an
annual time frame.
The range of uncertainty in the emissions for the example technology group is
slightly asymmetric, similar to the results obtained for the six month inventory.
The overall uncertainty in the six month emission inventory, inclusive of all four
technology groups considered, is shown in Figure 6-13. The estimated mean emission
rate is 157,400 tons of NOx emitted in a twelve month period. The 95 percent probability
range is enclosed by emissions of 132,200 tons and 186,600 tons. This is a range of -
25,500 tons to +29,200 tons, or -16 percent to +19 percent, with respect to the mean. The
asymmetry of the 95 percent probability range is a result of skewness in many of the
input assumptions among the four technology groups. The overall range of uncertainty
for the 12 month inventory inclusive of all technology groups is very similar, on a
relative basis, to the overall range of uncertainty for the six month inventory.
83
95 Percent ProbabilityRange
0.0
0.2
0.4
0.6
0.8
1.0C
umul
ativ
e P
roba
bili
ty
30000 35000 40000 45000 50000 55000 60000 65000
NOx Emission Inventory for T/LNC1 (tons/12month)
Figure 6-12. Uncertainty in a 12-Month NOx Emission Inventory for an IndividualTechnology Group (T/LNC1) Comprised of 11 Units.
95 Percent ProbabilityRange
0.0
0.2
0.4
0.6
0.8
1.0
Cum
ulat
ive
Pro
babi
lity
100000 120000 140000 160000 180000 200000
Total NOx Emission Inventory (tons/12 month)
Figure 6-13. Uncertainty in a Twelve-Month NOx Emission Inventory Inclusive of FourTechnology Groups.
Table 6-8. Summary of Uncertainty Results for the Twelve Month Emission InventoryCase Study
Random Error (%)aTechnologyGroup
2.5th
PercentMean
97.5th
Percentile Negative PositiveDB/U 40,200 56,900 75,100 29 32
DB/LNB 11,500 16,200 22,100 29 37T/U 27,600 37,100 50,800 26 37
T/LNC1 33,400 47,200 62,300 29 32Total 132,200 157,400 186,600 16 19
aResults shown are the relative uncertainty ranges for a 95 percent probability range, given with respect tothe mean value.
84
A summary of the uncertainty results for the entire six-month emission inventory
is given in Table 6-8. Although the absolute range of uncertainty for the total inventory
is greater than the absolute range of uncertainty for the selected technology group, the
relative range of uncertainty is smaller. This is similar to the results for the six month
inventory.
It should be noted that the twelve month inventory results cannot be obtained
simply by multiplying the results of the six month inventory by two. The 12-month
inventory includes data for all four quarters of the year, and thus represents activities and
emissions overall seasons of the year. In contrast, the six month inventory represents
emissions and activity only for the summer months.
6.6 Identifying Key Sources of Uncertainty in the Inventory
A method for identifying which technology groups contribute the most to
uncertainty in the overall emission inventory is included in AUVEE. The method is
based upon calculating the correlation between the uncertainty in emissions from an
individual group and the uncertainty in total emissions. The method is described in
Section 5.4. The correlation is a measure of the linear covariation of the two uncertainty
distributions. The larger the magnitude of the correlation, the stronger the linear
dependence between the two.
For the six month inventory, the relative importance of each of the four
technology groups with respect to uncertainty in the total emission inventory is illustrated
in Figure 6-14. Of the four technology groups, the dry-bottom, uncontrolled (DB/U)
group has the strongest correlation with uncertainty in the total emission inventory, with a
correlation coefficient of approximately 0.7. In contrast, the controlled tangential boiler
group used as the basis for the examples in Figures 6-1 through 6-10 has a correlation of
approximately 0.45, and was only the third most important of the four groups in
contributing to uncertainty in the total inventory.
As noted earlier, the fitted distribution for the controlled tangential boiler group
emission factor was not a particularly good fit to the data. However, given that this
particular group is only the third most important contributor to uncertainty in the total
inventory, the discrepancies in the fit are not likely to contribute substantially to errors in
the overall estimate of uncertainty in the inventory.
85
For the twelve month inventory, the relative importance of each of the four
technology groups with respect to uncertainty in the total emission inventory is illustrated
in Figure 6-15. The results are similar to those for the six month emission inventory.
The implication of the results of the analysis of uncertainty importance is that the
most effective way to reduce uncertainty in the overall emission inventory is to begin by
reducing uncertainty in the estimated emissions from the dry bottom, uncontrolled
technology group. Uncertainty can be reduced by collecting more data or by collecting
better data. However, in prioritizing data collection efforts, the cost of data collection
must also be considered.
0.0
0.2
0.4
0.6
0.8
Cor
rela
tion
Coe
ffic
ient
DB/U DB/LNB T/U T/LNC1
Technology Group (12 Month)
Figure 6-14. Relative Importance of Uncertainty in Emissions from IndividualTechnology Groups with Respect to Overall Uncertainty in the Total Emission Inventory:
Results from the Six-Month Emission Inventory Case Study.
0.0
0.2
0.4
0.6
0.8
Cor
rela
tion
Coe
ffic
ient
DB/U DB/LNB T/U T/LNC1
Technology Group (6 Month)
Figure 6-15. Relative Importance of Uncertainty in Emissions from IndividualTechnology Groups with Respect to Overall Uncertainty in the Total Emission Inventory:
Results from the Six-Month Emission Inventory Case Study.
87
7.0 CONCLUSIONS
This project has demonstrated a prototype software environment for calculation of
probabilistic emission inventories. The prototype software enables a user to visualize, in
the form of empirical probability distributions, the data used to develop the inventory.
Therefore, the user is able to observe the range of variability in the data. This is sharp
contrast from typical emission inventory work, in which point estimate values of
emission factors are used to calculate a single estimate of the inventory. The range of
variability in the example datasets was shown to be large. For example, the range of
inter-unit variability in emission factors for one technology group was a factor of
approximately three from the smallest to the largest value in the dataset.
Although it is not possible to quantify all sources of uncertainty, it is important to
quantify as many sources of uncertainty as is practical. The example case study
demonstrates the the range of uncertainty attributable to random sampling error is
substantial. For individual technology groups, the range of uncertainty is as large as
approximately plus or minus 30 percent, and for the total inventory the range of
uncertainty is approximately plus or minus 15 percent. These ranges of uncertainty are
likely to be substantially larger than measurement errors in the data. The case study is
based upon a relatively large sample of continuous emission monitoring data. Therefore,
it is likely that the data used in the case study are reasonably representative of actual
emissions among the population of units for the technology groups studied. For the case
study here, it is likely that random sampling error is the most important contributor to
overall uncertainty.
The estimates of uncertainty reflect the lack of information than an emissions
estimator would have regarding future emissions for the selected source category. As
noted early in the paper, it is now possible to have a high degree of uncertainty regarding
recent actual emissions at power plants equipped with CEM equipment. However, given
the inherent variability in emissions from one unit to another, and at a single unit over
time, it is not possible to have certainty regarding what the emissions will be at a future
time, whether in the near or distant future. In estimating distant future emissions, an
additional refinement that may be needed in the case study would be to consider changes
in capacity factor and the effects of capacity expansion. For relatively short term future
88
estimates (e.g., a year or two into the future), the methodology employed as is may
provide a reasonable estimate of absolute emissions. However, the relative range of
uncertainty estimated using the methods presented here are likely to be indicative of the
relative range of uncertainty in a future emission inventory, unless there is a large shift in
the relative contributions of different technology groups to the total inventory.
In addition to quantifying the substantial range of uncertainty in the inventory, the
case study demonstrates the capability to identify key sources of uncertainty in the
inventory. As noted, the largest contribution to uncertainty comes from one technology
group. Therefore, if it were an objective to reduce uncertainty in the overall inventory,
resources could be focused on collecting more or better data for the most sensitive
technology group. Knowledge of key sources of uncertainty can also aid in identifying
where it is not necessary to target additional data collection. For example, even though
there were some discrepancies in the fit of a parametric distributions to one of the
emission factors, that particular emission factor does not contribute substantially to
uncertainty in the overall inventory. Therefore, there would not be a large benefit
associated with improving the characterization of uncertainty for that particular input.
The project has demonstrated a probabilistic approach for development of
emission inventories. Because of the widespread use of inventories for policy making,
planning, and research purposes, it is important that the quality of the inventories be
known and that any shortcomings in the inventories be identified and prioritized for
improvement. The method illustrated here enables quantification of the variability and
uncertainty in each input to an inventory, quantification of the precision of the inventory,
and identification of key sources of uncertainty that can be targeted for reduction via
additional data collection and research. The latter is especially a critical concern when
allocating scarce dollars to potentially expensive field studies or surveys.
The quantification of uncertainty has many important implications for decisions.
For example, it enables analysts and decision makers to evaluate whether time series
trends are statistically significant or not. It enables decision makers to determine the
likelihood that an emissions budget will be met. Inventory uncertainties can be used as
input to air quality models to estimate uncertainty in predicted ambient concentrations,
which in turn can be compared to ambient air quality standards to determine the
89
likelihood that a particular control strategy will be effective in meeting the standards. In
addition, using probabilistic methods, it is possible to compare the uncertainty reduction
benefits of alternative emission inventory development methods, such as those based
upon generic versus more site-specific data. Thus, the methods presented here allow
decision makers to assess the quality of their decisions and to decide on whether and how
to reduce the uncertainties that most significantly affect those decisions.
It is recommended that future work focus on two main areas: (1) further
development of methods for quantification of variability and uncertainty in emission
inventories; and (2) application of methods to additional case studies. One
methodological need is to obtain improved fits of parametric distributions to data. For
example, in the case of the NOx emission factor for the tangential-fired furnace group
with combustion controls, it was not possible to obtain a good fit to the data using a
single component parametric distribution. However, it may be possible to obtain a much
better fit using a mixture of two or more distributions. The datasets used in this work are
comparatively extensive and of high quality compared to many other emission factor data
sets for other pollutants and/or emission sets. For example, emission factor data for
hazardous air pollutant emissions may be based on a very small number of measurements
and/or may include non-detected measurements. Methods for addressing these situations
should be included in the probabilistic analysis framework.
The case study in this work represents only one emission source and pollutant.
Future work should include demonstration of the probabilistic emission inventory
capability for other combinations of emission sources and pollutants.
91
8.0 ACKNOWLEDGMENTS
The authors acknowledge the support of the Office of Air Quality Planning and
Standards (OAQPS) of the U.S. Environmental Protection Agency, which funded most of
this work. Some support for the methodological components of this work was also
provided via U.S. EPA STAR Grants Nos. R826766 and R826790. The authors
appreciate the guidance and encouragement of Mr. Steve Rhomberg, formerly with U.S.
EPA, and Ms. Rhonda Thompson of U.S. EPA. The authors also thank Mr. Zhen Xie for
his contributions to the development of the internal database used in the AUVEE
prototype software.
93
9.0 REFERENCES
Ang A. H.-S., and W. H. Tang (1984), Probability Concepts in Engineering Planningand Design, Volume 2, John Wiley and Sons, New York.
Bammi, S., and H. C. Frey (2001), "Quantification Of Variability and Uncertainty inLawn And Garden Equipment NOx and Total Hydrocarbon Emission Factors,"Proceedings of the Annual Meeting of the Air & Waste Management Association,Orlando, FL, June 2001 (in press).
Box, G. E. P., and M.E. Muller (1958), “A Note on the Generation of Random NormalDeviates,” Annals of Mathematical Statistics, 29:610-611.
Cheng, R. C. H. (1977), “The Generation of Gamma Variables with Non-integral ShapeParameter,” Applied Statistics, 26:71-75.
Cohen, A.C., and B. Whitten (1988), Parameter Estimation in Reliability and Life SpanModels, M. Dekker: New York.
Cullen, A.C., and H.C. Frey (1999), Probabilistic Techniques in Exposure Assessment: AHandbook for Dealing with Variability and Uncertainty in Models and Inputs,Plenum Press: New York.
D’Agostino, R.B., and M.A. Stephens, eds. (1986), Goodness-of-Fit Techniques, M.Dekker: New York.
Efron, B., and R.J. Tibshirani (1993), An Intoduction to the Bootstrap, Monographs onStatistics and Applied Probability 57, Chapman & Hall: New York.
EPA (1995), Compilation of Air Pollutant Emission Factors, AP-42 5th Edition andSupplements, Office of Air Quality Planning and Standards, U.S. EnvironmentalProtection Agency, Research Triangle Park, NC.
EPA (1996), Summary Report for the Workshop on Monte Carlo Analysis, EPA/630/R-96/010, Risk Assessment Forum, Office of Research and Development, U.S.Environmental Protection Agency, Washington, DC. September.
EPA (1997), Guiding Principles for Monte Carlo Analysis, EPA/630/R-97/001, U.S.Environmental Protection Agency, Washington, D.C., March.
EPA (1999), Report of the Workshop on Selecting Input Distributions for ProbabilisticAssessment, EPA/630/R-98/004, U.S. Environmental Protection Agency,Washington, D.C.
Frey, H.C. (1997), “Variability and Uncertainty in Highway Vehicle Emission Factors,”Emission Inventory: Planning for the Future (held October 28-30 in ResearchTriangle Park, NC), Air and Waste Management Association, Pittsburgh,Pennsylvania, October, pp. 208-219.
94
Frey, H.C. (1998a), “Quantitative Analysis of Variability and Uncertainty in Energy andEnvironmental Systems,” Chapter 23 in Uncertainty Modeling and Analysis inCivil Engineering, B. M. Ayyub, ed., CRC Press: Boca Raton, FL, pp. 381-423.
Frey, H.C. (1998b), “Methods for Quantitative Analysis of Variability and Uncertainty inHazardous Air Pollutant Emissions,” Paper No. 98-105B.01, Proceedings of the91st Annual Meeting, Air & Waste Management Association, Pittsburgh, PA.
Frey, H.C., and R. Bharvirkar (2001), "Quantification of Variability and Uncertainty: ACase Study of Power Plant Hazardous Air Pollutant Emissions," in The RiskAssessment of Environmental and Human Health Hazards: A Textbook of CaseStudies, D. Paustenbach, Ed., John Wiley and Sons: New York. In press.
Frey, H.C., and D.E. Burmaster (1999), “Methods for Characterizing Variability andUncertainty: Comparison of Bootstrap Simulation and Likelihood-BasedApproaches,” Risk Analysis, 19(1):109-130, February.
Frey, H.C., R. Bharvirkar, R. Thompson, and S. Bromberg (1998), “Quantification ofVariability and Uncertainty in Emission Factors and Inventories,” Proceedings ofthe Conference on the Emission Inventory, Air and Waste ManagementAssociation, Pittsburgh, Pennsylvania, December.
Frey, H.C., R. Bharvirkar, J. Zheng (1999). Quantitative Analysis of Variability andUncertainty in Emissions Estimation; Final Report, Prepared by North CarolinaState University for Office of Air Quality Planning and Standards, U.S.Environmental Protection Agency, Research Triangle Park, NC.
Frey, H.C., R. Bharvirkar, and J. Zheng (1999b), “Quantification of Variability andUncertainty in Emission Factors,” Paper No. 99-267, Proceedings of the 92ndAnnual Meeting (held June 20-24 in St. Louis, MO), Air and Waste ManagementAssociation, Pittsburgh, Pennsylvania, June (CD-ROM).
Frey, H.C., and S. Li (2001); "Quantification of Variability and Uncertainty in NaturalGas-fueled Internal Combustion Engine NOx and Total Organic CompoundsEmission Factors," Proceedings of the Annual Meeting of the Air & WasteMangement Association, Orlando, FL, June (in press).
Frey, H.C., and D.S. Rhodes (1996), “Characterizing, Simulating, and AnalyzingVariability and Uncertainty: An Illustration of Methods Using an Air ToxicsEmissions Example,” Human and Ecological Risk Assessment, 2(4):762-797.
Frey, H.C., and D.S. Rhodes (1998), “Characterization and Simulation of UncertainFrequency Distributions: Effects of Distribution Choice, Variability, Uncertainty,and Parameter Dependence,” Human and Ecological Risk Assessment, 4(2):423-468.
Frey, H.C., and L.K. Tran (1999), Quantitative Analysis of Variability and Uncertainty inEnvironmental Data and Models: Volume 2. Performance, Emissions, and Cost
95
of Combustion-Based NOx Controls for Wall and Tangential Furnace Coal-FiredPower Plants, Report No. DOE/ER/30250--Vol. 2, Prepared by North CarolinaState University for the U.S. Department of Energy, Germantown, MD
Frey, H.C., J. Zheng (2000), User’s Guide for the Prototype Software for Analysis ofVariability and Uncertainty in Emissions Estimation (AUVEE), Prepared by NorthCarolina State University for the U.S. Environmental Protection Agency,Research Triangle Park, NC.
Hahn, G.J., and S.S. Shapiro (1967), Statistical Models in Engineering, John Wiley andSons, New York.
Harter, L.H. (1984), “Another Look at Plotting Positions,” Communications inStatistical-Theoretical Methods, 13(13):1613-1633.
Hattis, D., and D.E. Burmaster (1994), “Assessment of Variability and UncertaintyDistributions for Practical Risk Analyses,” Risk Analysis, 14(5):713:729.
Hazen, A. (1914), “Storage to be Provided in Impounding Reservoirs for MunicipalWater Supply,” Transaction of the Americal Society of Civil Engineers, 77:1539-1640.
Holland, D.M., and T. Fitz-Simons (1982), "Fitting Statistical Distributions to AirQuality Data by the Maximum Likelihood Method," Atmospheric Environment,16(5):1071-1076.
Johnson, N.L., and S. Kotz (1970a), Continuous Univariate Distributions-1, Distributionsin Statistics, Hoghton Mifflin: Boston.
Johnson, N.L., and S. Kotz (1970b), Continuous Univariate Distributions-2,Distributions in Statistics, Hoghton Mifflin: Boston.
Kini, M.D., and H.C. Frey (1997), Probabilistic Evaluation of Mobile Source AirPollution, Volume 1: Probabilistic Modeling of Exhaust Emissions from LightDuty Gasoline Vehicles, Prepared by North Carolina State University for Centerfor Transportation and the Environment, Raleigh, NC.
Law, A.M., and W.D. Kelton (1991), Simulation Modeling and Analysis 2d Ed.,McGraw-Hill: New York.
Marsaglia,G. and T.A. Bray (1964), “A Convenient Method for Generating NormalVariables,” SIAM Review, 6:260-264.
Morgan, M.G., and M. Henrion (1990), Uncertainty: A Guide to Dealing withUncertainty in Quantitative Risk and Policy Analysis, Cambridge UniversityPress: New York.
96
NRC (1991). Rethinking the Ozone Problem in Urban and Regional Air Pollution,National Academy Press: Washington, D.C.
NRC (1994), Science and Judgment in Risk Assessment, National Academy Press:Washington, D.C.
NRC (2000), Modeling Mobile Source Emissions, National Academy Press,Washington,D.C.
Pollack, A.K., P. Bhave, J. Heiken, K. Lee, S. Shepard, C. Tran, G. Yarwood, R.F.Sawyer, and B.A. Joy (1999), Investigation of Emission Factors in the CaliforniaEMFAC7G Model. PB99-149718INZ, Prepared by ENVIRON InternationalCorp, Novato, CA, for Coordinating Research Council, Atlanta, GA
Rhodes, D.S., and H.C. Frey (1997), “Quantification of Variability and Uncertainty inAP-42 Emission Factors: NOx Emissions from Coal-Fired Power Plants,” InEmission Inventory: Planning for the Future, The Proceedings of A SpecialtyConference, Air & Waste Management Association: Pittsburgh, PA, pp. 147-161.
Rubin, E.S., M. Berkenpas, H.C. Frey, and B. Toole-O’Neil (1993), “Modeling theUncertainty in Hazardous Air Pollutant Emissions,” Proceedings, SecondInternational Conference on Managing Hazardous Air Pollutants, Electric PowerResearch Institute, Palo Alto, CA.
Seiler, F.A., and J.L. Alvarez (1996), “On the Selection of Distributions for StochasticVariables,” Risk Analysis, 16(1):5-18
Seinfeld, J.H. (1986), Atmospheric Chemistry and Physics of Air Pollution, John Wileyand Sons, New York.
Small, M.J. (1990). “Probability Distributions and Statistical Estimation,” Chapter 5 inUncertainty: A Guide to Dealing with Uncertainty in Quantitative Risk and PolicyAnalysis, Morgan, M.G., and Henrion, M., Cambridge University Press: NewYork.
Steel, R.G.D., and J.H. Torrie (1980), Principles and Procedures of Statistics, ABiometrical Approach 2d ed., McGraw-Hill: New York.