proceedings of the 1979 doe statistical symposium

235
CONF 791016 Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM '(•7) sponsored by Computer Sciences Division Oak Ridge National Laboratory Unipn Carbide Corporation, Nuclear Division Department of Energy

Upload: others

Post on 21-Apr-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

CONF 791016

Proceedings of the

1979 DOE STATISTICAL SYMPOSIUM

'(•7)

sponsored by

Computer Sciences DivisionOak Ridge National LaboratoryUnipn Carbide Corporation, Nuclear Division

Department of Energy

Page 2: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

CONF-791016Distribution Category UC-32

Proceedings of the1979

DOE STATISTICAL SYMPOSIUM

October 24-26, 1979Riverside Motor LodgeGatlinburg, Tennessee

Compiled and Edited byDonald A. Gardiner

Tykey Truett

Sponsored by theDepartment of Energy

Date Published: September 1980

Prepared by• Mathematics and Statistics Research Department

Computer Sciences Division• O a k Ridge National Laboratory

Oak Ridge, TennesseeOperated by

Union Carbide Corporat ion

Thii booh wwpnpwad j t an acaxrt of ward wunaBra* by an ajtncv ol f|ht UnrM S u m GoMmnvnt.Neither i t * U n M S u m Gowvnmtnt nor any atancy IN

warranty. W O orc o r r v i v w w a L w U H *

raprfawMl thai in uaitommarcial produci, p

S U M I Gcmrnwwt or

MOMVi fyMMornm

i npvnoa or aMu^Hiv a^y iayi*

M M M of any Mornwion. a n

would not infrinn prtMMfv OM

oow. or arnica by trade namt.

any aiancy * t n » * . Tna vmtt ar4ci rhoai of int UnMM Slam Oown

«eri. nor any t* tMir »wtfoyaat. nwkM any

aaAty or raiaonvvirty for ih# accuracy.

arahu. era4w». or promt dhctoaMl. or

Nd ««|Ml Netlwtnc* harfifl to any mrcifktradai^vfc. matmttciurar. or othannriw, doa

Komrwawiaiioii. or fawrtnf by tnt UMttd

cyiniom or aMinon vRffrwao riare^n do notimarrt or any a yancv tnaraof»

•ISTRIBUTMHI OF THIS DOCUMENT IS

Page 3: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

ContentsPREFACE v

WELCOME vii

INVITED ADDRESS

Estimating the Performance of Subgroups, Herbert Hobbins 3

RESEARCH PAPERS

Component Effects in Mixture Experiments, Gregory F. Piepel 7

Application of Discriminant Analysis and Generalized Distance Measures to UraniumExploration, J. /. Beauchamp, C. L. Begovich, V. E. Kane, and D. A. Wolf 20

Sensitivity Study on the Parameters of the Regional Hydrology Model for the NevadaNuclear Waste Storage Investigations, R. L. Jman, H. P. Stephens, /. M. Davenport,R. K. Waddell, and D. I. Leap 44

Evaluation of ECC Bypass Data with a Nonlinear Constrained MLE Technique,Thomas A. Bishop, Robert P. Collier, and Robert E. Kurth 62

Nuclear Fuel Rod Stored-Energy Uncertainty Analysis: A Study of Error Propagationin Complicated Models, A. R. Olsen and M. E. Cunningham 71

Use of 3" Parallel Flats Fractional Factorial Designs in Computer Code UncertaintyAnalysis, Donald A. Anderson, Jack Mardekian, and Dale M. Rasmuson 82

Estimating Residential Energy Consumption from the National Interim EnergyConsumption Survey: A Progress Report, Thomas H. Woteki 91

Energy R&D in the Private Sector, Pasquale Sullo, John Wilkinson,

and Howard K. Nason 94

The Autopsy Tissue Program, Terry Fox and Gary Tietjen 97

Mortality Among Men Employed (Between 1943 and 1947) at a Uranium-ProcessingPlant, P. P. Polednak and E. L. Frome 116

A Critique of Person-Rem-Years as an Estimator of Risk from Ionizing Radiation,Peter G. Groer 128

WORKSHOPS

WORKSHOP I: MODEL EVALUATION

Sensitivity Analysis and a National Energy Model Example, Michael D. McKay 139

Strategies for Model Evaluation, James Gruhl 146

in

Page 4: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

IV

Methodological Issues in Analyzing Automotive Transportation Policy Models:

Two Case Studies, Barbara C. Richardson 154

Discussion Session 163

WORKSHOP II: RISK ANALYSIS

Statistics and Risk Analysis—Getting the Acts Together, Roger H. Moore 171

Distinctions Between Risk Analysis and Risk Assessment, William D. Rowe 177

Some Notable Probability and Statistics Problems Encountered in Risk Analysis

Applications, W. E. Vesely 178

Risk Analysis and Reliability, V. R. R. Uppuluri 187

Discussion Session 192

WORKSHOP III: ANALYSIS OF LARGE DATA SETS

The Many Facets of Large, Daniel B. Carr 201

The Analysis of Large Data Sets Project—Computer Science Research Areas 205Bob Burnett 205

Help, Where Are We?, Richard J. Beckman 209

A Small Shopping List for Large Data Set Analysis, Leo Breiman 211

Discussion Session 217

WORKSHOP IV: RESOURCE ESTIMATION

An Analysis of a Statistical Model to Predict Future Oil, John H. Schuenemeyer 223Statistical Analysis of Petroleum Exploration, Peter Bloomfield 226Data Requirements for Forecasting the Year of World Peak Petroleum Production,

Lawrence /. Drew and David H. Root 227

Discussion Session 231

LIST OF ATTENDEES , 235

Page 5: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Preface

The 1979 DOE Statistical Symposium was the fifth in the series of annualsymposia designed to bring toge [her statisticians and other interested partieswho are actively engaged in helping to solve the nation's energy problems.This symposium was different from the previous conferences in twoimportant respects.

For the first time, the conference was held at a location remote from anenergy laboratory site. The program, therefore, did not include tours offacilities such as were held at Los Alamos Scientific Laboratory, Oak RidgeNational Laboratory, Pacific Northwest Laboratory, and SandiaLaboratories.

The previous four symposia contained sessions during which unsolved orincompletely solved problems were presented, followed by sessions in whichpossible solutions to the problems were discussed. For the 1979 symposium,the problem sessions were replaced by workshops on model evaluation, riskanalysis, analysis of large data sets, and resource estimation.

As had been the practice in the past, the program included presentationsof technical papers. These sessions centered around exploration and disposalof nuclear fuel, general energy-related topics, and health-related issues. TheProgram Committee, chaired by Irene Montie, consisted of David Hall,Ronald L. Iman, Victor E. Kane, and Ray A. Waller.

At the conclusion of the symposium, participants met for a critique and todiscuss plans for future symposia. The 1980 DOE Statistical Symposium willbe hosted by Lawrence Livermore Laboratory at the Claremont Hotel inOakland, California, on October 29-31, 1980. The format of the 1979symposium (e.g., a combination of contributed papers and workshops] will becontinued for the 1980 meeting.

Although the 1979 symposium differed from previous conferences of theseries in some respects, it continued to share a significant commondenominator—that is, the goal of furthering communication amongstatisticians in the DOE community. Attendees included both statisticiansand nonstatisticians from the DOE laboratories, the academic community,and industries involved in energy-related research. The contributions of allthose who participated in and supported this symposium are greatlyappreciated.

Page 6: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Welcome

Roger F. Hibbs

President, Nuclear DivisionUnion Carbide Corporation

It is a pleasure to welcome you to the 1979 DOE Statistical Symposium. I am

particularly pleased that your organizing committee has brought together a large number

of statisticians and other scientists to communicate ideas aimed at seeking solutions to

our energy problems.

The last time I had the privilege of addressing this group was in 1976. At that time

the energy storm clouds had already formed. During the intervening three years the

situation has gone from bad to worse and even the most optimistic of us foresee an even

graver situation unless this nation's basic energy policies undergo a complete overhaul.

I believe that our energy policy is in a state of chaos. Some aspects are totally

ridiculous. One of the most ludicrous has to do with our oil pricing policy. For example,

while we are paying about $24 a barrel for foreign oil, we insist that domestic crude oil be

priced at $6.47* a barrel. It should be obvious that this pricing scheme is going to do

nothing to stimulate exploration for new domestic oil deposits, even though "new" oil

might be priced somewhat higher.

This past spring and summer we went through a contrived gasoline shortage. Norman

Macrae, Deputy Editor of The Economist, comments that the gasoline crisis had nothing

to do with gasoline but was created by the allocations devised to mitigate the problem.

He further explains that, if you establish allocations slightly more restrictive than those of

the previous year during the summer holiday season, then of course you have a shortage

where the cars stay, countered by a surplus in the areas where they don't have gas to drive

to.

But oil is only one aspect of our energy woes. One of the most serious situations

centers around our nuclear program. Because of a concern for proliferation, our national

policy is not to sell enriched uranium for nuclear power facilities to other nations unless

they abide by certain very stringent guidelines we have established. Furthermore, we

stopped all reprocessing efforts in this country and strongly discouraged it worldwide. I

have no doubt that those who promulgated this policy were sincere. The only problem is

that such a program is unworkable, because other nations hav? sought alternate ways to

fuel their nuclear power industry.

•Exxon figure as of 10/8/79.

VII

Page 7: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

VIU

As a result of our counterproductive policy, commercially competitive enriching serv-ices are now available from France, the United Kingdon\ Germany, The Netherlands, andthe Soviet Union. Further, countries such as Japan, Pakistan, South Africa, Italy, andBrazil are rapidly developing this capability. Also, the United Kingdom, France, Ger-many, Japan, the Soviet Union, and Brazil either have or are developing breeder reactorsand/or reprocessing capabilities. Thus, our policy has not only lost us the worldwidemarket for enriching services but has actually enhanced proliferation since we are nolonger considered a reliable supplier.

This policy is also making it difficult for our reactor manufacturers to competeinternationally and may well spell the end of a reactor manufacturing industry in theUnited States. This kind of thinking, coupled with the unbelievable regulatory morass,may well result in a self-fulfilling policy of denying the availability of nuclear power tothe people of the United States.

To be sure, proliferation is a very serious problem, but it can only be controlled byappropriate political processes, not by the United States unilaterally sticking its head inthe sand and hoping the problem will go away.

The increased use of coal is also not without its serious difficulties. Mining is danger-ous, and people don't like strip mines. To burn coal under boilers, elaborate measures arerequired to remove SO2, NO^, and participates from the flue gas. Even after all this, theeffect of increased CO2 in the atmosphere — the greenhouse effect — is uncertain.

We also need the contribution of solar energy; however, it isn't apparent that solarenergy in any of its forms offers any significant short-term solution.

Looking at the events of the last three years, it is difficult to be very optimistic. Themuch publicized incident at Three Mile Island is being made to serve the cause of anti-nuclear groups. Reprocessing, the breeder reactor, and waste isolation remain essentiallypolitical, not technical, issues. Near-term significant increases in the use of coal and solarenergy do not seem reasonable. Meanwhile, the shadow of the Russian bear hangs menac-ingly over our oil supply.

I could continue to discuss this subject for much more time than has been allotted formy welcome to you. The problems we face must be dealt with. Statisticians, who arefamiliar with probabilities and risk-benefit relationships, can make a positive contributionto the problems we face by actively supporting a more realistic policy in the nationalenergy field.

Please accept my best wishes for a good meeting, and welcome to East Tennessee.

Page 8: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Invited Address

Page 9: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM
Page 10: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Estimating the Performance of Subgroups

Herbert RobbinsBrookhaven National Laboratory

Upton, New York

The usual problems of statistical inference concern the relationship between a pair (0,X), where 9 is an unobservabJe parameter and X is a random variable whose probabilitydistribution depends in some specified manner on the true value of 0. From an observedsequence X^, ..., Xn of independent X values with constant unknown 9, we try toestimate or test hypotheses about the true value of 0.

Empirical Bayes theory attacks a different but equally important problem: From anobserved sequence Xl Xn of X values such that (0(., X.) are independent random

vectors with the same unknown joint distribution, we try to determine some properties ofthat joint distribution, such as the value of E(0 \X = x) for some fixed x, or more generallyE{0 \X e A ), where A is some subset of interest in the sample space of X.

For example, suppose that (0, X) is such that

given 6,X is N(9,l), and (1)

0 has an arbitrary unknown distribution function G . (2)

Then the conditional density function of X given 0 is

f(pc\0) = < (x - 9), where «/<x) = exp(-x2/2)/ N/2TT

and the unconditional or marginal density function of X is

Moreover,

E(6\X = x)= r 6<p(x-9)dG(0)lf(x).

Now,

/ ' (* ) =fl (x - OMx - 9) dG{6) = -xf(x) + £ W = x)f(x) ,

Page 11: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

so

E(6 \X = x) = x + f'(x)/f(x) (3)

and

E(d\X>x) = - = ^ - ^ — . (4)

To estimate Eq. (4) from a random sample Xy, ..., Xn we can define

if Xi^xWi ] 0 otherwise ' ri 1 0 otherwise '

and

I Wi - nAx)

which will converge to Eq. (4) as «-*» by the law of large numbers. What remains is toestimate f(x) from Xy, ..., Xn. Using any consistent method for density estimationenables us to estimate Eq. (4) consistently for arbitrary unknown G.

Similar considerations of an even simpler character apply where X, given 8, is Poisson,negative exponential, etc. The binomial case is exceptional and requires special treatment.To estimate E(p\X = x), where X = number of successes in m independent trials withconstant success probability p (randomly varying from one set of m trials to another), wepropose the estimate

T _ j f x2n(x) (m - x) (x + l)n(x + 1) 1 ,„" m [xnfxj + (m - x + l)n(x - 1) (JC + l>i(x + l) + (m- x)n(x)\ ' w

where for x = 0, 1, . . . , m, n(x) = number of Xy Xn equal to x. This is aweighted average of consistent estimates of Jensen inequality lower and upper bounds forE(p\X - x). Monte Carlo simulations indicate that Tn performs well even for moderatevalues of n, especially after isotonizing to take into account the fact that E(p\X = x) is anincreasing function of x.

Page 12: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Research Papers

Page 13: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM
Page 14: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Componenf Effects in Mixture Experiments*

Gregory F. PiepelBattelle Pacific Northwest Laboratories

Richland, Washington

ABSTRACT

In a mixture experiment, the response to a mixture of q components is a function of theproportions r t , x2 xq of components in the mixture. Experimental regions for mixtureexperiments are often defined by constraints on the proportions of the components forming themixture. The usual (orthogonal direction) definition of a factor effect does not apply because of thedependence imposed by the mixture restriction,

A direction within the experimental region in which to compute a mixture component effect ispresented and compared to previously suggested directions. This new direction has none of theinadequacies or errors of previous suggestions while having a more meaningful interpretation. Thedistinction between partial and total effects is made. The uses of partial and total effects (computedusing the new direction) in modification and interpretation of mixture response prediction equationsare considered. The suggestions of the paper are illustrated in an example from a glass developmentstudy in a waste vitrification program.

INTRODUCTION

In a mixture experiment, the response to a mixture ofq components is a function of the proportions x\,xi, . . . . xq of components in the mixture. Becauseeach Xj represents the proportion of the ith componentin the mixture, the following restrictions hold:

(1=1,2, . . . , q);1=1

Physical, economical, and theoretical considerationsoften impose additional restrictions in the form oflower and upper bounds

•Report PNL-SA-7993, BatteSe Pacific Northwest Labora-tories, Richland, Washington. Work Supported by the Depart-ment of Energy under Contract No. EY-76-C-06-1830.

0 < « , < * , < & , < l ( / = 1 , 2 , . . . , q ) ;

c , < £ *,<</,

C2< £ Xi<d2

(2)

**ck< Xi<dk

(3)

Page 15: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

on the levels of components or sums of components. Itis assumed that all restrictions are nondegenerate, thatis,

1=1 1=1

*/+l */+l */tl */H

£ m -*" h "m . i "m " 2* m •m=*y m=fcy- m=*y- m=Jy

The functional form of the response

E(y)=flxl,x2, xq)

is usually not known. Often first- or second-degreepolynomial approximation models

(4)

(5)

are used. Other approximation models have beenproposed; for example, see Becker [ 1 ] , Cox [2] , Draperand St. John [3] , Hackler, Kriegel, and Hader [4] , andKenworthy [5] .

Geometrically, restriction (1) defines the experi-mental region as a regular (q - 1 dimensional simplex.In general, restrictions (2) and (3) reduce the experi-mental region given by Eq. (1) to an irregular (q - ]}•dimensional hyperpolyhedron. Considerations relatingto experimental design, analysis, and interpretation ofresults usually cause a nonorthogona! set of spanningvectors with q dependent components to be choseninstead of transforming the region to a (q — iydimensional subspace having orthogonal basis vectorswith q — 1 independent components. Withinthis framework, the usual definitions of factor andinteraction effects no longer apply. In an orthogonalbasis-independent factors situation, the effect of a

factor is measured in a direction orthogonal to the spacespanned by the other factors. When moving in such adirection, the level of the factor of interest vaiies whilethe levels of the other factors remain fixed. This isimpossible in the nonorthogonal mixture space becauseof restriction (1). For the same reasons, the usualconcept of interaction is not applicable to mixtureexperiments. Cross-product terms of components (inmodels that contain thsm) are referred to as nonlinearblending terms, as are other non-first-degree term?.

The effect of a mixture component on the response ofinterest will be defined in a meaningful way in thesections to follow.

TOTAL AND PARTIAL COMPONENT EFFECTS

The concept of a factor effect usually accompanies ananalysis of variance (AOV) approach to a problemrather than a regression approach. There is no difficultyin applying the concept to a regression framework,although certain distinctions in the definition need tobe made. Factor effects in the AOV approach areusually considered only when no interaction exists.However, in a regression approach, curvature terms(which are analogous to AOV interaction) will usuallybe present. In this situation, a factor effect becomes afunction of the amount of change in the factor level.This is taken into account in the following definitionsof effects of mixture components.

Definition 1

The total effect of component i will be defined as

where x/y and x/ are points in the experimental regionsuch that component / is at its highest and lowest valuesrespectively. The points x// and x/. will depend uponthe constraints of the form of Eqs. ( l ) - ( 3 ) whichdetermine the experimental region.

Definition 2

The partial effect due to a change A, in the ithcomponent from Xff will be defined as

i, xN | xM)

where x/v and XJH are points in the experimental regionsuch that Xflf represents the point obtained from Xf/due to the A, change in the ith component. The pointXf/ will have changes in the other components of xjyoffsetting A,.

Page 16: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Both definitions are somewhat vague at this time andare meant to give a general understanding of theterminology. The effects obviously depend upon themodel in use and the starting (xj, or xjy ) and ending(Xf/ or x/y) points. Note that the starting and endingpoints determine a direction along which the effect willbe measured. The selection of a meaningful directionwhich keeps these points within the experimental regionis the object of this paper and will be addressed later.

The terminology of partial and total effects will beused when speaking about both true and estimatedeffects. The meaning will be clear from the context.

BACKGROUND AND PREVIOUS SUGGESTIONS

It was noted in the introductory section that a factoreffect in an orthogonal nonrelated factors situation ismeasured in a direction orthogonal to the space spannedby the other factors. This approach was proposed formixture experiments by Snee and Marquardt [6] forfirst-degree polynomial approximation models. Theygave two formulas for computing effects (total esti-mated) depending upon the experimental region. For anexperimental region given by the restrictions in Eq. (1),they computed the effect (total) of component / as

-(a - 1Y-1 14- (6)

and as

(7)

where /?/ = b( - a/, for an experimental region given byEqs.(l)and(2).

Formula (6) is equivalent to the suggestion of thispaper for a simplex experimental region. Formula (7),however, may give serious errors when some com-ponents have large ranges and some have small ranges. Itis seen from Fig. 1 that formula (7) may computeeffects based on points outside of the experimentalregion or even based on impossible points.

Although the strategy of varying one component andfixing the others is impossible in mixture experiments, aparallel strategy observing the restriction J x , = l canbe developed.

Suppose a total effect of component i is of interest.An increase in the proportion of component / must beoffset by decreases in the other components. To beuseful in estimating the effect of component i, thedecreases in the other components must neither mask

DIRECTION OF MEASUREMENTFOR EFFECT OF COMPONENT 1

Fig. 1. Tkne-compoMM eumple (mperinmitti regionafcaded) duptayn* poMibie coon at fonrah (7).

nor inflate the effect. Choosing the decrease all fromone component or splitting it in such a way as to takemore from a component with a smaller range than fromone with a bigger range are obviously not the preferredstrategies. Without prior information, the best strategyappears to be to decrease each component proportionalto the ratio of the mixture (without component 0 inwhich it "usually" appears. The "usual" appearance isprobably best taken to be the overall centroid of theexperimental region.

Cox [2] first proposed a method to adjust the othercomponents, given a change A,- in component i. When x,-is changed to x{- + A/, Xj is changed to

- si), (8)

for 7 = 1 , 2 / - 1, i + 1 , . . . , q, where t = f>i, si,..., Sg) is some reference mixture. Cox utilized this incalculating changes in response but did not apply themethod to an actual definition of an effect of acomponent. Snee and Marquardt [6] did apply Cox'swork and proposed the following formula for the effect(total) of component / relative to a reference mixtuie s:

(9)

Equation (9), however, may give serious errors depend-ing upon the experimental region. The direction in

Page 17: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

10

which the effect is measured is determined by the linejoining the reference mixture and the ith componentsimplex vertex as pictured in Fig. 2. In this example, thetotal effect of component 1 would be estimated usingtwo points Xfj and x/, outside the experimental region.

It is seen in both Figs. 1 and 2 that the relatedformulas, Eqs. (7) and (9), could be in error due to theimplicitly determined directions in which the com-ponent (total) effects are measured. One solution mightbe to estimate partial effects over changes inside theexperimental region. It should be recognized that insome situations, the directions implied in formulas (7)and (9) would be allowable (i.e., use points inside theexperimental region) for measuring total and/or partialcomponent effects. In fact, these or other directionsmight be preferred over the suggestion in the nextsection, given prior information about the general effectof the components on the response. This will usuallynot be the case, however. Our goal should be a formula(with its inherent direction) that will be appropriate(1) for each component, (2) for partial or total effects,and (3) for any experimental region.

APPROPRIATE DIRECTION FOR MEASURINGCOMPONENT EFFECTS

Discussion in the previous section indicated that thedirection for measuring component effects should de-pend upon the overall centroid of the experimentalregion. The application of Cox's work [2] by Snee andMarquardt [6] was shown to be inadequate in somecases due to the shape, size, and location of theexperimental region within the simplex defined byEq. (1). However, it is still possible to use Cox's work

by negating the consequences of the size, shape, andlocation of the experimental region through a pseudo-component transformation. A pseudo-component trans-formation is appropriate when the experimental regionis defined by Eqs. ( l )-(3) , where there is at least one«/ > 0. Assuming that restrictions of the form of Eq. (3)are not degenerate, that is, c,-<x/<</,-, the pseudo-component transformation is given by

t = ( f - 1 , 2 , . . . . « ) , (10)

where

and x'i represents the proportion of the ith pseudo-component in the mixture. (Gorman [7] and othershave noted the desirability of the pseudo-componenttransformation to improve the conditioning of thedesign.)

The basic strategy, then, for choosing an appropriatedirection in which to compute effects will be totransform to pseudo-components and then apply Cox'swork. (Snee [8] suggested a similar strategy in relationto computation of gradients but has not applied it tomixture component effects.) Geometrically, this is thedirection determined by the line joining the centroid ofthe experimental region to the "pseudo-componentsimplex" vertex of the component of interest. This ispictured for the three-component example of Fig. 1 inFig. 3. Note that, for total effects, the suggested

DIRECTION OF MEASUREMENTFOR EFFECT OF COMPONENT 1 REFERENCE MIXTURE S

TAKEN TO BE OVERALLCENTROID OF THEEXPERIMENTAL REGION.

Fig. 2. ThtM-compoiieat exawple (experimental legion rinded) displaying po«Me enon of fonuda (9).

Page 18: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

r11

PSEUDO-COMPONENTSIMPLEX

DIRECTION FORCOMPONENT 2

CENTROID

DIRECTION FORCOMPONENT 3

DIRECTION FORCOMPONENT 1

Fig. 3. Three-component example depicting pseudo-component simplex and suggested direction for measuring component effects.

direction uses points in the experimental region inaccordance with Definition 1. This will always occurusing this strategy.

COMPUTING COMPONENT EFFECTS

Following the strategy outlined at the end of theprevious section, formulas for linear and quadraticmixture component effects (total and partial) can beobtained, where linear effects are obtained when Eq. 4is the model and quadratic effects when Eq. 5 is themodel. The partial linear effect of component / is

xN | xM)A'/ * \= T - ^ ( ft - I fa) ,

and the partial quadratic effect of component / is

*</=!

where the prime notation indicates pseudo-componentvalues. The */ values in formula (12) are proportions ofcomponents in the starting mixture XM (see Defini-tion 2). Formulas (11) and (12) are appropriate forexperimental regions defined by restrictions such asEqs. ( l )-(3) where A,- is an allowable change in theproportion of component /. Total effects may be foundby substituting the range i?, for A,. When the experi-mental region is given by restrictions (1) and (2),Ri = bj - aj. When additional restrictions, Eq. (3), areadded, Rj will depend upon the exact form of therestrictions.

For other models, it may be easier to compute effectsusing Definitions 1 and 2 than to derive complicatedformulas. Points xM and xN are obtained using Eq. (8),where the overall centroid of the experimental region isused as the reference mixture.

The definition (Definitions 1 and 2) approach tocomputing mixture component effects is easier toimplement and use with standard computer regressionroutines than a formula approach and is hence recom-mended for all models. A summary of the definitionapproach will fill in the above outline.

\.PEt{Ai,xN \xM)^(Xyv)-y(xM)

=y(xN)-y(xM),

where the prime notation indicates pseudo-component values. Note that the prime notation./

Page 19: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

12

implies the prediction equation based on pseudo-component design points, wherey(x) =y'(x').

2. The points x'M and x'N are

2. equal effects (components / and k):

= *

xNj ~ J

where 3' = (sj , s'2, • •., s') is the overall centroid ofthe experimental region in pseudo-components andA( is some allowable change in the /th component.

3. Partial effects are computed as in step 1. Totaleffects may be computed by choosing

where /?,• is the range of the /th component alongthe direction of measurement.

USING COMPONENT EFFECTS

Component effects have two basic applications inmixture experiments, model reduction and interpreta-tion of the prediction equation (response surface). Ofprimary importance in model reduction is the situationwhen a component has no effect. In such a case thecomponent may be treated as a slack variable within itsallowable range for future experiments. Dependingupon the shapes of the experimental region andpredicted response surface, it may be desirable to fixthe proportion of the component at some appropriatelevel, thereby eliminating it from further consideration.The appropriate level will depend upon the goals of theexperiment (i.e., finding maximum or minimum ofresponse surface) as well as the experimental region andfitted surface. If allowable, the proportion may often beset to zero. The situation of equal component effects isalso important in model reduction. These two situationseach provide a restriction on the parameters useful inmodel reduction and for a linear model are

1. no effect (component /):

03)

- s't)

where /y = A;7(l - sy').

- s

E 3>/=0, (14)

Note that the restrictions for a linear model aredifferent than those given by Snee and Marquardt [6}due to the new direction in which a mixture componenteffect is measured. Variances of the effects for settingconfidence intervals and testing hypotheses can beobtained easily from the definition approach to com-puting effects. In general,

Var[7E] = Xx'H) - y'(x'L)]

- ulfa = Var[(U// --uL)a2,

Var[/>£1 =

where U represents the pseudo-component version ofthe original design matrix X while u represents apseudo-component version of a point x in the experi-mental region.

It is seen that variances of effects depend upon thestarting and ending points x^(x^) and x//(x/v). This isof no concern when computing total effects since thestarting and ending points are fixed by the direction ofmeasurement and range of the component. However,when considering partial effects, the variance willdepend on the starting point XM as well as on thechange A/. This must be remembered when comparingpartial effects of the different components.

Although component effects are useful tools formodel reduction, they are much more useful as aids ininterpreting and understanding the response and thefitted prediction equation. In an orthogonal non-related-factors situation, the parameter estimates areused for interpretation. The usual parameter estimatesin a mixture experiment do not aid in the interpretationof the response surface. Cox [2] has proposed trans-formations and restrictions that provide parameterestimates with interpretive meaning. These are hard toutilize, however, while the proposed mixture com-ponent effects are easy to obtain.

It is easily seen that partial and total effects will yieldentirely different information for most models withnonlinear blending terms. Hence, the uses of partial andtotal effects in interpretation are considered separatelyfor linear and nonlinear models.

Page 20: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

13

For linear (first-degree) models, the total effect of acomponent is just a multiple of a partial effect. Partialeffects due to some common change (say A = 0.01,0.02, 0.05, . . . ) should be calculated for all com-ponents. This will allow comparisons among the com-ponents as to relative importance per unit of change.Extending the partial effects to total effects allowscomparison of the components relative to their allow-able ranges.

For models with nonlinear blending terms, totaleffects are not multiples of partial effects due tocurvature in the response surface. For each component,several partial effects should be calculated using dif-ferent starting points and different changes A,-. Forcomponent comparison purposes, several magnitudesshould be the same. The overall centroid of theexperimental region is a good starting point as arepoints of intersection of the boundary of the experi-mental region with the direction lines in which effectsare measured. Total effects should be computed andanalyzed along with the partial effects calculated.

The variances of all effects computed should bedetermined and considered in the comparisons. Recallthat effect variances will depend upon the starting pointand length of change A. The amount of dependenceupon A will depend upon the experimental design(design matrix X) for the mixture experiment.

In comparing various mixture component effects,(1 — a)% confidence intervals can be calculated using

Effect ± ta/2 (n - P)[(UAT -

wheren = number of observations,p = number of terms in model,ta/2(n -p)~ student's f table value for n - p degreesof freedom,UM = beginning pseudo-component point for effect,u/v = ending pseudo-component point for effect.

These confidence intervals are based on the assumptionof normally distributed errors.

ELEVEN-COMPONENT GLASS EXAMPLE

One interesting type of mixture experiment involvesthe mixing of chemicals to form glass. The author iscurrently involved with a waste vitrification study, thegoal of which is to develop waste glass formulationswith acceptable levels of such properties as teachability,viscosity, volatility, conductivity, and crystallinity. The

key toward this goal is the ability to understand theeffects the eleven chemical components have on thevarious properties.

The experimental region for this problem is definedby the constraints of Eqs. (l)-(3):

0 < x 7 < 0.0650<Xg <0.080<x9 < 0.0350 < x 1 0 < 0.0350<xn < 0.035

0.41 <xi <0.600.055 <JC 2 < 0.15

0<*3<0.160<x4<0.140<JCS <0.09

0.09 <JC6< 0.17

11

0 . 5 4 < x 1 + X 2 + x 3 < 0 . 8 00.1

(IS)

(16)

0 0

Restrictions (15) and (17) result from theoreticalconsiderations that indir te when a mixture of chemi-cals will melt into a gi»is, and also incorporate pastknowledge about acceptable glass formulations.

The experimental points and observed values for acidleach (weight % lost) from a preliminary study are givenin Table 1. The experimental design is based on anextreme vertices design, where a Wynn type [9,10]algorithm was utilized. The extreme vertices for theexperimental region wero generav«d by CONVRT (con-strained vertices), an algorithm developed by the authorto handle additional constraints of the form of Eq. (3).(See [11—13] about extreme vertices designs.) A pointwas added to the center of the region to check forcurvature and several points were replicated to obtainan estimate of the experimental error.

Chemical analysis composition proportions are givenin Table 1 instead of actual design points, since signifi-cant differences between the two were found through-out the design.

Before component effects can be computed, a predic-tion equation must be fit to the data. Theoretical andstatistical considerations indicated common logarithmsof the leach values should be fit. Table 2 displays theAOV and pseudo-coefficients for a log leach predictionequation which had a nonsignificant lack of fit at thea = 0.05 level.

Although a significant fit is important, it is noted thatmuch useful information from the mixture componenttechnique is still available when a significant lack of fitis not "too large." This will depend upon the specificproperty and amount of data available.

Page 21: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

o

1I

i

.cu

« 10—1

r—

oX

CJiX

COX

NX

X

inX

X

COX

CMX

f—X

Val

ue

oz

COo

CM

£

CO

oCVJ

CMO•r"t-

ZnO

oCM

<dz

o

CaO

COo

CMr—

COO

CMO

oQ.

r - i -« r»-*r cvj

UlOOOOOlOlCM O O O CVJ CM O <o o o o o o o <

lOOONONNCVIEMr-1-r-ONi-OOI>OOOOOOOOOOQOOOOOO<

OI%ONoo

» OtMOr>nONONooooo

OOIOIOIOrn«AOQCMNMNtMCMNooooooooo

o m oooooi_ .ONOnnooiOOOOOOOOOl

O O r - O O CO O <O CO CO O CD Cvl O IO O O O O O O I

>i-»o>OOOoOi—> CVJ CM O CO O

>OO O O O O<f O CO r— i— r— CO <

ooooo o<

_on*wor-1— ocoo«*o©o<Nonnnonnopioc ioor

) O O O O r— <-) O O OCO CO CVJ>OO O O OO

ooooo <

ioionrrvo»»tinoioioio>tn*%no<p)OOf— I— I— I— f— r—r-i— I— O O i - P T - r - r - r -

OOOOiOM )cooor-.ooi00ooo<—cocDtoi o o o M N O O o » n r ) n n n o o o o (I O O O O O O O O O O O O O C

IOUOOVOOOOOUOOOI)30O0O00OO0000'

O C 1 0 0 01<JC1P)01 I - <— OCV|COCOrOO*OvO«3r— IOr— OOOr— •— •— O O O O i — >— i— »— O O O O O t

g O ( M O Oor-.i—nOO

«H00001PN<cvicsjcocvjcocoooi—cvji—

O O O

< i O O r o or - O O O O C D i

. .Jt-(v)Cvl(VINC0Olr-*O, . . _ . joinmn<t<oon«t»ti-Nnpii(i<*o

• O O O r — O O O O O r — O O O O O r — I— Or— r— O O t — r— Or— r-r— »— r— O

n 0 r I D J 0 * 0 * i O C O C V J C O l O C O C V J r — >— U100«d"r— t - » V O C O C O ^ J ' l » > l . 0 4 0 l l . 5 lvOU>CvlU>CvJU3U>CvjCVJr— (VI <O IO (M CVJ (O r- 0»CnCr>CvJVOVO«*'CVJCVJCVJ«OvOCVjCOvO<OvCi'OVDCOO O r - Or—OOl—f—r—i— OOr-i— Or—OOOr—OO-—>—<—>—OOr—»— O r - O 0 0 » —

* C O ICOvO«O» — O O

l»'-<>COlOCVJCO»0O t — t—O

t— r— CVJCVICVI rococon3inioicooiOr-cMM'*inioNcooiO<vin<n<D

CVJCVJCVICVICVJCVJCVJCvJCVJCVICOCOCOCOrOCOCOCOoiOrCv(i )CO^J' ' f*!'

•0

O

VI

O

i0) .a. c

•r- in• J o *J•c o. ca» •«-•r- 01 O0) +> a.

VI

a. c£ S

t> coa •at cvj—I •* r - CVJ

Page 22: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

IS

Table 2. Analyst* of vatiuice and pieudo-coefffcienti for acid leach example.

ANALYSIS OF

SOURCETOTALREDRESSIONERRORLOFPURE ERROR

VARIANCE

DF43222113&

SS75.5493875.124780.424530.371030.05350

MS

3.414760.020220.024740.00892

ADJUSTED R-SQUARED* 0.9885

LACK OF FIT F TEST F« 2.77

COMPONENTSIO2B203AL20CAOHGDNA2OZNOTI02CR20FE20NIO3203AL20

(ZNOSI02ZNO1203SI02CR20SI02SI02CAO8203

XXXXXXXXXXXX

CR20FE20NIOHGOFE20AL20AL20NIDTI02CR20NA20CAO

XXXX

NIONIOZHONIO

PSEUOO COEFF-2.734534314.70317084.21827981.26982380.33459503.5083280

-9.5016546-9.34436427.6494899

-30.0079174-20.8789482-226.3843536110.0421295198.222534219.3459759103.5249939-24.3559780-14.1154194490.1559143769.8175049

-2380.9042969158.3256226232.0658417

SD0.53!0.8820.4650.3100.6730.6900.8990.4922.0902.6902.348

16.46611.41417.1662.940

15.1463.2532.121

66.918101.401400.835

62.104

A useful scheme of computing a variety of partialeffects is to use the overall centroid of the experimentalregion as the starting point, while the effects ofincreasing incremental changes of 0.01 (positive andnegative) are computed. These effects for the leachexample are illustrated in Table 3. Although theseeffects are computed as changes from the centroid, it iseasy to compute the change between any two pointsrepresented in the table as follows. LetPA\ and .PA2 beany two points related to effects as computed inTable 3, where it is assumed without loss of generalitythat A 2 > A).Then

PE(\A2\ ± =PE(A2)-PE(Al) , (18)

= Var{P£1(A2)]

- 2 Var[y(s)] , (19)

where PA\ will be the new starting point and thechange will be IA2I + |Ai | if the centroid s is betweenFA) and PA2 and IA2I - |Aj I if s is not between PA]andPA2.

Although the component effects and their standarddeviations in Table 3 are essential for significance

Page 23: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

16

Table 3. Fartial and total effects for eleven-component leach example. Standard deviations of effects arebelow effect value (in parentheses).

Changefrom

Centroid

1SiO, B2°3 CaO f!gO Na7O

A7ZnO

"8TiC, Cr2°3

A10Fe2°3

"11N10

+ .10

+ .09

+.08

+ .07

+ .06

+.05

+.04

+.03

+ .02

+ .01

-1(

-1(

-1(

- 1 .(.

.402

.067)

.269.059)

.136

.051)

.000

.043)

-0.863(-037)

-0.(•

-0.(.

-0.(.

-0.(.

.724030)

583024)

441018)

296012)

-0.149(.006)

0

0

.812

.068)

.665

.056)

0.552(.044)

0.(•

0.

0.

.385

.033)

252022)

123Oil)

1.162(.072)

0.998(.061)

0.844(.051)

0.697(.041)

0.560(.033)

0.431(.025)

0.310(.019)

0.198(.012)

0.095(.006)

0.473(.050)

0.415(.043)

0.353(.036)

0.289(.029)

0.222(.022)

0.151(.015)

0.077(.008)

0.

0.

0.

365057)

313045)

251033)

0.178(.021)

0. G94Oil)

0.(•

0.

0.

413037)

278025)

140012)

0.

0.

0.

187043)

134029)

072015)

-0.522(.048)

•0.368(.038)

-0.230(.027)

-0.107(.014)

-0.245 -0.292(.053) (.045)

-0.152 -0.141(.029) (.023)

0.389(.047)

0.199(.023)

Centroid predicted log leach value is -0.044 with standard deviation .037.

01

02

03

04

05

06

07

08

0.151(.006)

0.304(•012)

0.460(-019)

0.618(-025)

0.778(.033)

0.941(.041)

1.107(.049)

1.274(-059)

-0.119(.011)

-0.233(.022)

-0.342(.032)

-0.086(.006)

-0.164(.012)

-0.233(.019)

-0.293(.026)

-0.345(.033)

-0.389(.042)

-0.081(.008)

-0.165(.016)

-0.344(.034)

-0.439(.043)

-0.538(.053)

-0.105 -0.142 -0.081(.010) (.012) (.015)

-0.345 -0.432(.031) (.036)

-0.580(.047)

0.091(.016)

-0.220 -0.286 -0.172 0.165(.021) (.024) (.031) (.033)

0.222(.052)

0.213(.035)

0.130 -0.209(.022) (.022)

TotalEffectSO

-2. 814

H I

1.208

.105

1.610

.097

1.095

.108

0.829

.095

1.134

.097

0 .479

.097

- 0 .

818

116

-0 .

586

106

-0.483

.079

0.705

.080

Page 24: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

17

testing, they are unwieldy in that form for interpreta-tion purposes. The graphical presentation in Fig. 4presents predicted values for all components along theirindividual "effect directions" plotted versus mole per-cent change in the component from its centroid level.Such a graphical presentation presents component;effects utilizing predicted values and allows easy com-parisons among the chemical components.

In examining Fig. 4 and Table 3 it is seen that eachcomponent has a monotonic effect in the direction ofinterest. This certainly need not occur when there arenonlinear blending terms in the model. When it doesoccur, the amount of work required to understand thefitted equation is substantially reduced.

The significance of effects visually identified fromFig. 4 can be checked from the information in Table 3with the two-sided Mest (normally distributed errorsassumed). Of primary interest are components with noeffect on the property. Such a component would have ahorizontal effect curve in Fig. 4. In this example, alleffects are significantly different from zero at the 95%confidence level.

For this same eleven-component design, several otherproperties were measured. A neutral (soxhlet) leach testwas conducted and produced the effects plots in Fig. 5.

The fit model used to generate these effects containedall first-degree terms and ten second-degree cross-product terms. It had a lack of fit F- 2.14, which canbe compared to an/"-distribution with 17 and 6 degreesof freedom.

It is seen in Fig. 5 that several components (SiO2,MgO, Na2O, ZnO, TiO2, Cr2O3, and NK)) have little orno effect on log soxhlet leach values over all or part oftheir ranges.

This work is being applied to nuclear and chemicalwaste disposal problems, where it is desired to minimizeteachability. Component effects can be used to indicatefavorable formulations for glass with low teachability.In the acid leach example, proportions of SiO2, TiO2,Fe2O3, and Cr2C>3 should be as high as possible, whileproportions of B2(>3, AI2O3, CaO, MgO, ZnO, Na2O,and NiO should be as low as possible, if low acidleaching glass is desired. Of course other formulationsmay be suggested for other properties. Cornionenteffects can be useful tools in multivariate responseproblems.

In closing this example section, it should te re-membered that the leach rate design data and fits a.e dlfrom a preliminary study and that leaching results canbe affected by the laboratory procedures used. Hence,

1.5

p 1.0S

5 0.5 -

Io3 0.0

§-0.5I£ -1.0 hQ.

-1.5-10

Fig. 4.

SiO2

AI2<>3__

Cao—

I

B2O3

<

1

I

\F^O3

^ ^ ^ p T NiO

MINIMUM STANDARD± .035 a t - 1 M% Na2O

MAXIMUM STANDARD±.062at+6M%B2O3

1 J 1

NiO

Ca-fY,

ERROR

ERROR

1

i -ZnO

\JiO

B2O3

1 1

.MgO

\SiC»21

-8 8- 6 - 4 - 2 0 2 4 6M% CHANGE IN COMPONENT

Log acid leach effects plot. Predicted values versus mole percent change in component.

10 12

Page 25: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

J8

3

i

oto

Q.

0.30

0.20

0.10

0.00

0.10

0.20

AI2O3 N

-

SIO2

- ^ ^TiC

MgO-

B 2 O3-

CaO

^

(

1 1

v\

C r ^

'MINIMUM-±.009

. MAXIMUM

1

\

r0

STANDARD

STANDARD

1 1

C ZnO

\DEVIATION

DEVIATION

1 1

JCaO

-Tia2

\

A!I 1 1

-10 -8 8 10 12- 6 - 4 - 2 0 2 4 6

m CHANGE IN COMPONENT

Fig. 5. Log wxhlet leach effects plot. Predicted values '/ersus mole percent change in component.

the examples of this section should only be viewed assuch, and not as a final study product.

SUMMARY

This paper proposes a meaningful direction in whichto measure mixture component effects that has none ofthe inadequacies or errors of previous suggestions.Partial and total effects along this direction are defined.A technique to compute these effects which can easilybe added to slightly modified existing regression rou-tines was suggested. Although component effects areobtained from a fit prediction equation, the overalleffects plots are often not greatly affected by compet-ing prediction equations. The display and methods ofanalysis of the computed partial and total effects werediscussed. The main use of mixture component effectsis seen to be in understanding and interpreting theprediction equation (response surface), although effectscan "be utilized in model reduction or screening. Finally,

the suggestions of the paper were applied to a pre-viously unpublished date set from a glass mixtureexperiment.

The author recognizes the need for other analysistechniques (see [15—18] and others) for mixtureexperiments. The proposed techniques are meaningfuland easy tools to use to help understand the de-pendence of the property or response upon the propor-tions of components of the mixture of interest.

ACKNOWLEDGMENTS

The author would like to acknowledge the helpfuldiscussions of Dr. R. D. Snee on this and other relatedtopics in the field of mixture experiments. Also, theauthor would like to thank J. N. Diven for his computerimplementation of CONVRT and the Wynn and ex-change algorithms used in the design of the glassmixture experiment example, as well as L. A. Chick forhis chemical, ceramic, and glass expertise.

Page 26: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

19

REFERENCES

1. N. G. Becker, "Models for the Response of a Mixture,"/. R. Stat. Soc. B, 30: 349-58 (1968).2. D. R. Cox, "A Note on Polynomial Response Functions for Mixtures," Biometrika 58: 155-59 (1971).3. Norman R. Draper and Ralph C. St. John, Models and Designs for Experiments with Mixtures, University of

Wisconsin Technical Reports 360 and 361,1975.4. W. C. Hackler, W. W. Kriegel, and R. J. Hader, "Effect of Raw Material Ratios on Absorption of Whiteware

Compositions,"/ Am. Ceram. Soc. 39:20-25 (1956).5 .0 . 0 . Kenworthy, Factorial Experiments with Mixtures Using Ratios," Industrial Quality Control Series XIX,

No. 12,24-26.6. Ronald D. Snee and Donald W. Marquardt, "Screening Concepts and Designs for Experiments with Mixtures,"

Technometrics 18: 19-29 (1976).7. J. W Gorman, "Fitting Equations to Mixture Data with Restraints on Compositions," / . Qual. Technol. 2:

186-94 (1970).8. Ronald D. Snee, "Discussion of Cornell and Ott (1975)," Technometrics 17: 425-30 (1975).9. R. C. St. John, and N. R. Draper, "D-Optimality for Regression Designs: A Review," Technometrics 17: 15-23

(1975).10. H. P. Wynn, "The Sequential Generation of D-Qptimum Experimental Designs," Ann. Math. Stat. 41: 1655-64

(1970).I!. R. A. McLean and V. L. Anderson, "Extreme Vertices Design of Mixture Experiments," Technometrics 8:

447-54 (1966).12. Ronald D. Snee, "Experimental Designs for Quadratic Models in Constrained Mixture Spaces," Technometrics

17: 149-59 (1975).13. Ronald D. Snee and D. W. Marquardt, "Extreme Vertices Designs for Linear Mixture Models," Technometrics 16:

399-408 (1974).14. Donald W. Marquardt and Ronald D. Snee, "Test Statistics for Mixture Models," Technometrics 16: 533-37

(1974).15. J. A. Cornell and J. W. Gorman, "On the Detection of an Additive Blending Component in Multicomponent

Mixtures," Biometrics 34: 251-63 (1978).16. J. A. Cornell and L. Ott, 'The Use of Gradients to Aid in the Interpretation of Mixture Response Surfaces,"

Technometrics 17: 409-24(1975).17. Sung H. Park, "Selecting Contrasts Among Parameters in Scheffe's Mixture Models: Screening Components and

Model Reduction," Technometrics 20: 273-79 (1978).18. Ronald D. Snee, "Techniques for the Analysis of Mixture Data," Technometrics 15: 517-28 (1973).

Page 27: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Application of Discriminant Analysis and Generalized DistanceMeasures to Uranium Exploration

/. /. Beauchamp, C. L. Begovich, V. E. Kane, and D. A. WolfComputer Sciences Division

Union Carbide Corporation, Nuclear Division*Oak Ridge, Tennessee

ABSTRACT

The goal of the National Uranium Resource Evaluation Program is to estimate the nation's ura-nium resources. Using discriminant analysis methods on hydrogeochemical data collected in the NUREProgram aids in formulating geochemical models that can be used to identify the anomalous areas usedin resource estimation. Discriminant analysis methods have been applied to data from the Plainview,Texas, quadrangle, which has approximately 850 groundwater samples with more than 40 quantitativemeasurements per sample. Discriminant analysis methods involving estimation of misclassificationprobabilities, variable selection, and robust discrimination arc particularly useful. In addition, amethod using generalized distance measures enables assignment of samples to a background populationor to a mineralized population whose parameters have been estimated from separate studies. Each ofthese methods is relevant in identifying areas of possible interest to uranium exploration.

INTRODUCTION

Multivariate statistical methods provide a naturalframework for studying the interrelationships of geo-chemical parameters considered in mineral exploration.Typically, samples of some media are collected overwide geographic areas and analyzed for numerous geo-chemical parameters. The interpretive phase consists ofseparating typical background samples from anomaloussamples that are possibly associated with mineralization.The following presentation explores the use of discrimi-nant analysis methodology to (I) identify the regionalgeochemical parameters that may be important informulating regional geochemical models, (2) validatethe geologic origins of the samples, and (3) identifypossible mineralization-related samples based on eitherthe background population or known mineralized popu-lations. Groundwater data from the Plainview, Texas,

*UCC-ND is operated for the Department of Energy underContract No. W-7405-eng-26.

National Topographic Map Series quadrangle, collectedas part of the National Uranium Resource Evaluation(NURE) Program, are used for illustration.

Before applying discriminant analysis methods, it isimportant to consider data preprocessing methods, ofwhich treatment of censored laboratory data and evalu-ation of distributional considerations of the variablesare important aspects. Additionally, it is assumed thatthe samples are preliminarily assigned to the geologicunit representing their geologic origins. This assignmentwill be assessed by discriminant analysis as part of thepreprocessing.

DESCRIPTION OF STUDY AREA

The NURE Program selected the Plainview quadranglefor its study because hydrogeochemical data and anevaluation of potential uranium mineralization usinggeologic, radiometric, drilling, and hydrogeochemicaldata for that region [1] had been published. An area ofapproximately 20,350 km2 (7860 sq miles) located inthe Great Plains between latitude 34° and 35° N and

Page 28: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

21

longitude 100° and 102° W, the Plainview quadrangle isdivided into rolling plains to the east and the LlanoEstacado of the southern High Plains to the west by thegenerally north-south trending Caprock Escarpment.

Although the subsidence of the Palo Duro Basin ex-erted an influence on depositional systems throughoutthe end of the Permian section, units exposed at thesurface are relatively flat-lying, creating a relativelysimple geology. The San Andres (Blaine) formation(coded as PGEB), Whitehorse group (coded as PGWC),and Quartermaster group (coded as POQ) dip generally4.7 m/km (25 ft/mile) to the west. The Dockum groupand Ogallala formation (coded as TPO) overlie thePermian section at an uncomformable surface and dipgenerally to the southeast at approximately 1.9 m/km(10 ft/mile). These are relatively shallow dips, and thelack of major faulting results in relatively predictablesubsurface geology with the surficial outcrop patterncomplicated by the relatively deep erosion in the PaloDuro Canyon and the Caprock Escarpment.

The uniform dips of the bedrock units result in rela-tively simple groundwater flow patterns for those aqui-fers that are penetrated by domestic water wells. Thewater-table surface in the Permian section east of theCaprock Escarpment follows the general topographicslope; west of the Caprock Escarpment, groundwater inthe Permian units may flow along the regional dip,which is approximately 4.7 m/km (25 ft/mile) to thewest. Groundwater flow in the Dockum group appearsto be to the southeast, with a regional water-table dip ofapproximately 1.9 m/km (10 ft/mile). The Ogallala for-mation is the major aquifer in the western half of thequadrangle, and groundwater flow appears to be rela-tively consistent with a regional water-table dip to thesoutheast at approximately 1.9 m/km (10 ft/mile).Although regional topography and dip are relativelyconsistent, there may be many variations in ground-water flow direction resulting from changes in perme-ability, local structures, and overpumping of theaquifer.

DESCRIPTION OF DATA ANDPREPROCESSING

Field personnel collected well- and spring-water sam-ples and shipped them to Oak Ridge, Tennessee, wherechemical analyses consisting of about 40 measurementswere performed. A form with approximately 30 itemsof information was completed for each sample in thefield, the information on the form including the assign-ment of a geologic-producing horizon of the ground-water and the measurement of the total alkalinity, pH,

and conductivity (converted to specific conductance) ofthe water.

Chemical analysis procedures included fluorometryfor uranium, atomic absorption for arsenic, spectropho-tometry for sulfate, and plasma source emission spec-trometery for Ba, B, Ca, Li, Mg, Mo, Na, V, and Zn.Complete details of field and laboratory procedureshave been described [2] and delineated [ 3 ] . Basic dataanalyses and displays of both the groundwater andstream sediment data [4] are also available. (A com-puter tape of all data can be obtained from DaltonAtkins, Grand Junction Office Information SystemProject, Union Carbide Corporation, Nuclear DivisionComputer Applications Department, Building 4500N,Oak Ridge National Laboratory, P.O. Box X, OakRidge, TN 37830.)

Because the discriminant analysis methods to be usedin subsequent analyses require noncensored (above thelaboratory detection limit) data whose distribution maybe reasonably approximated by a multivariate normal,the preprocessing portion of the data analysis consistsof an initial screening to reduce the number of variablesunder consideration and an evaluation of the statisticaldistribution of the selected variables. The followingsteps are involved.

1. Check and note the samples within each geologicunit for missing data.

2. Chart histogram and probability plots. (These plotsmay be used as an initial screen for obvious non-normality.)

3. Determine the proportion of noncensored data.4. Test for normality for each of those variables that

are not affected by missing data or a large propor-tion of censored observations.

Examining the data for the original set of observedvariables for each of the five geologic units of interestreveakd that problems of missing data or censored ob-servations were nonexistent or minimal for the 12 vari-ables U, B, Ba, Ca, Li, Mg, Na, Zn, SO 4 , specific con-ductance, total alkalinity, and pH. These variables wereused in the distribution evaluation portion of the pre-processing.

Summary statistics for the 12 variables, as well asthree additional uranium-related variables (arsenic,vanadium, and molybdenum) to be used latei, were ex-amined for each geologic unit. Robust measures, such asthe median, were examined to evaluate the influence ofany censored data. After the deletion of observationswith missing values, the sample sizes for the differentgeologic groups were 357 for TPO, 121 for PGEB, 275for PGWC, and 77 for POQ. Groundwaier sanpfes from

Page 29: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

22

the Dockum group were not considered because of thesmall sample size (16).

Figure 1 is a histogram of the calcium variable ofPOQ, where an apparent bimodal distribution appears.The observations associated with each mode were clus-tered together in distinct geographic regions. Therefore,the POQ samples were partitioned into two separatesubgroups, denoted by POQE (41 samples) and POQW(36 samples) for subsequent analyses. The above patternmay be the result of a higher proportion of carbonatelithology in the upper POQ in the west.

The next phase of the data preprocessing was to deter-mine whether the distribution of the observed variablescould be reasonably approximated by the normal orlognormal distribution. A combination of differenttechniques was used to evaluate the adequacy of thelogarithmic transformation: probability plots [ 5 ] ; histo-grams, sample skewness, and kurtosis measures [ 6 ] ;Shapiro-Wilk test statistics [ 7 ] , and a modified versionof the Kolmogorov-Smirnov D-statistic [ 8 ] . The tests ofnormality for all five geologic groups indicated thatlogarithmic transformation was preferred over no trans-formation to achieve marginal normality within each of

o no 300 400 500 tOO TOO

Calcium(ppm)

the groups for each of the observed variables exceptcalcium. The pH variable WM not transformed becauseit already represented the log of a concentration andwas approximately normally distributed. Therefore, inall subsequent analyses all variables except calcium andpH were transformed by using the logarithmic transfor-mation.

DISCRIMINANT ANALYSIS

Discriminant analysis provides a criterion for classify-ing a collection of observation vectors into one of aspecified number of groups [ 9 - 1 1 ] . In this section itwill be determined whether a subset of the observedvariables could be used to adequately discriminate be-tween the five geologic groups (TPO, PGWC, PGEB,POQE, and POQW). The chemical concentrations of in-dividuai samples will be used to determine whether theprior assignment of samples to one of the five groups istenable. Additionally, the chemical parameters thatmost accentuate the differences among the five unitswill be determined, enabling formulation of possiblegeochemical models to characterize the regional geo-chemistry.

The discriminant functions used to classify an obser-vation vector are known linear (equal population covari-ances) or quadratic (unequal population covariances)functions of the observed variables, if the observationvectors follow a multivariate normal distribution. Theobservation vectors from each, geologic unit were usedto estimate the mean vector and covariance matrix ofthe assumed multivariate normal distribution. The esti-mated linear or quadratic discriminant score was used toclassify each observation.

Different methods are available for reducing the num-ber of variables used in discriminant analysis and aresimilar to the variable selection procedures of regressionanalysis. Criteria based upon a measure of the differ-ences between groups or upon minimizing the proba-bility of misclassiflcation are intuitively appealing andrelatively easy to apply. The Wilks A-statistic [ 12 ] ,defined as the ratio of the determinant of the withinsums of cross products matrix (W) to the total sums ofcross products matrix (T), is a common measure used toevaluate differences between groups. The estimatedprobability of misclassification is the measure used toevaluate the performance of the discriminant functionand is estimated by

Ffc. 1. oT cakfaM fa the Met po«|ii+i

Page 30: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

23

where ir,- is the probability that an observation vectorconies from group i (for this example, ity = . . . = irs =1/5) and Pr(/|/) is the observed proportion of vectorsknown to come from group i and incorrectly classifiedinto group / by the sample discriminant function. Varia-tions in the procedures arise when equal covariancematrices are not assumed for ail groups. The appliedprocedures are described as follows:

1. DISCRIM [13] chooses the subset of variables basedon the minimum value of the Wilks A-statistic,A = \W\(\T\, where HI is the determinant of thematrix A. This selection procedure assumes equalcovariance matrices for the different geologicalgroups and calculates the statistic for all possiblesubsets of variables.

2. BMDP7M [14] is a stepwise procedure that usesF-statistics as the default to select the best subset ofvariables. This procedure also assumes equal covari-ance matrices.

3. Modified DISCRIM [G. P. McCabe, Jr., personalcommunication, 1979] is similur to DISCRIM butassumes unequal covariance matrices and makes useof the modified Wilks A-statistic Y[)=l

where |W,| and |7/| are the determinants of the indi-vidual within and total sums of cross productsmatrices for each group.

4. Forward selection first tests for equality of the co-variance matrices over the different groups by usingall variables [15] and then considers each variableseparately, calculates the discriminant function, andestimates the probability of misclassification byreclassifying the original data, using the estimateddiscriminant function. The variable with the smallestoverall probability of misclassification is chosen asthe best single discriminatory variable. The remain-ing variables are individually considered with thebest single variable, and the preceding process isrepeated to produce the best pair of discriminatingvariables in terms of the estimated probability ofmisclassification. The process is repeated until all ofthe variables are included in the discriminant func-tion. This procedure has the property that once avariable has been included in the discriminant func-tion, it will always be included in subsequent stages.

5. Backward selection is similar to forward selectionexcept that it starts with the complete xt of vari-ables and drops each variable separately at the firststage. The smallest estimated probability of misclas-sification determines the first variable to be deleted.The procedure is repeated with the reduced set ofvariables to determine the second variable to bedeleted. Once a variable is deleted, it is excluded

from further consideration. The SAS procedureDISCRIM [16] was used to do the necessary calcula-tions for the forward and backward selection proce-dures.

For most reconnaissance geochemical data, the differ-ences in geo.jgy will cause varying means and covari-ances between the geologic populations. Differentcovariance matrices, as were observed in the Plainviewdata, necessitate use of variable selection proceduresbased on quadratic discrimination. Therefore, the modi-fied DISCRIM, forward, or backward procedures werefound to have an advantage over the unmodifiedDISCRIM or stepwise BMDP7M procedures, which as-sume equal covariance matrices. In addition, theDISCRIM procedures have the advantage of consideringall possible subsets of variables. The forward and back-ward procedures use a criterion that is addressed to cor-rect allocation, a consideration if future samples are tobe classified [17] .

Figure 2 summarizes the results of these preliminaryanalyses and displays the estimated Wilks A-statisticsand probabilities of misclassification as a function ofthe number of variables in the model. Figure 2 indicatesthat there is more than one feasible subset of variablesbased upon the Wilks A, the modified Wilks A-statistic,or the estimated probability of misclassification. Fig-ure 2(a) is a plot of the estimated probability of misclas-sification for the different variable selection proceduresconsidered. Figure 2(b) is a plot of Wilks A and modi-fied Wilks A from DISCRIM. The major reduction inthese criteria occurs as the number of variables in themodel increases from one to three.

The results from DISCRIM and BMDP7M are identi-cal, except that the BMDP7M stepwise procedure se-lected a ten-variable model as its final choice. A plot ofthe change in the estimated probability of misclassifica-tion going from a p to p + 1 (p = 1,2, . . . , 11) vari-able model is shown in Fig. 3 for th" modifiedDISCRIM procedure. Small values of this change wouldindicate possible stopping points for the number of vari-ables to be included in the model. Low values of thechange in misclassification probability forp of 4 , 7 , and10 correspond to three possible candidate models. Anexamination of the corresponding changes in the modi-fied Wilks A-statistic in Fig. 3 shows that the majorreduction in this statistic occurs ztp - 7. Therefore, theseven-variable model seems appropriate, because boththe probability of misclassification and the group sepa-ration show only small changes for p>l. The sevenvariables included in this model are ln(U), ln(specificconductance), ln(B), ln(Ca), ln(Li), ln(Mg), and ln(SO«);these variables will be denoted as the regional variables

Page 31: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

r

is

Is

0 " SMDH-DISCHIM• • MODIFIED DISCRIMA - FORWARD• > BACKWARD

4 f f 7 t tNUM«ER OF VAMAkLES IN MODEL

10 I I 12

1 « 10s -

1 > 10•» I

• - WILKS AA' MODIFIED WILKS A

! X TD'1

1 > 1O2

5 e 7 « 9 10NUMBER OF VARIABLES IN MODEL

Fig. 2. Remits of variable selection procedutes: (p) probabilities of misclu»ific»tion; 0 ) Wilks A and modified Wilks A from DISCRIM.

Page 32: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

25

• OWMI MMMCLMMFICATIOMmatAMiiTr

1 2 ] « i « 7 ( * M I 1NUUMR OF VAHIAM.K MlfMAlLE* MOM I

Fig. 3 . Change in muclutificition probability and modifiedWilk* A from p to p +1 variables.

(i.e., variables that discriminate between the regionalgeologic units).

Additional geologic considerations might motivate thechoice of a different set of variables. Table 1 displayssome of the alternative model choices for the differentvariable selection criteria. Many of these alternative setsof variables have values of the optimization criteriawhich differ only slightly from the minimum value.When the geochemistry of the region is considered, oneof the alternative seven-variable sets may provide a moreparsimonious model for the data. Selection of a set ofvariables fiom Table 1 based on a geochemical modelwould probably improve the analyses discussed in thenext section.

A prerequisite for the use of the previous discriminantanalysis procedures is the preliminary assignment ofsamples to the five geologic-unit groups. This assign-ment of samples can be verified by standard discrimi-nant analysis methods. Using the regional variables, allsamples were classified into one of the five geologicpopulations. Considering only samples near the geologiccontacts for reassignment, three samples were changedfrom the initial coding (12131, 11896 from PGWC to

TPO; 11886 from TPO to PGWC). These redassifica-tions will, of course, have negligible influence on thePlainview analyses. However, field classification of thrgeologic origin is often unavailable, or more complexgeology could increase sample misclassification.

A primary concern in any statistical analysis is therobustness, that is, the sensitivity to the underlyingstatistical assumptions, of the procedures that areemployed. To evaluate the robustness of the linear andquadratic discrimination procedures used, the robustdiscriminant analysis methods described by Randieset al. [18, 19] were used with the seven regional vari-ables on six pairwise discrimination problems from thefive geologic groups. The 13 various possible versions ofthe robust discriminant analy is methods did not ap-preciably improve upon the standard linear and qua-dratic discrimination methods. The two standardmethods only differed by a few misclassified samplesfrom the best robust method, which typically made useof the Huber [20] estimate of the covariance matrix.

INTERPOPULATION DISTANCE MEASURES

If X and Y are p-dimensional column vectors, then

Z>2(X; Y, C) = (X - Y)' C-»(X - Y)

is a general functional form of the squared multidimen-sional distance from X to Y where CisapXppositivedefinite matrix so that D2 > 0.

If X = (ii and Y = /x2. w l i e r c the m's are the mean

vectors from ^-dimensional multivariate normal distri-butions with common covariance matrix C = 2 , thenD2<Vi".M2»2) is the Mahalanobis distance [12]. TheD2

distance from group i to group / may be estimated bythe sample value £>2(x/; x/, S) for the case when thecovariance matrices are assumed to be equal, where x,-(x/) is the sample mean vector for group i (j), and S isthe pooled sample covariance matrix from all groups.When it is not reasonable to assume equality of thegroup covariance matrices, the D2 distance from group/to group i is estimated by Z?2(x;-;x/,Sj), where S,- is thesample covariance matrix of the ith group.

Table 2 shows values of the generalized squared multi-dimensional distance matrix for three different seven-variable models. The D2 distance between the differentgroups indicates a reasonable separation between TPO,POQW., and the remaining groups PGEB, PGWC, andPOQE. Because the generalized squared distance wasused in classifying the observations to the different geo-logic groups, it is not surprising that the estimated

Page 33: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

26

Method

DISCRIM

ModifiedDISCRIM

Forward

Backward

t a n

U

XXXXXX

XXXX

XX

e i. i

SP*

XXXXXX

X

XXX

XXXXXXXXXX

X

B

X

XXXXXX

X

XXXXXX

•uve

Ba

X

XX

X

X

XX

XXX

muue

Ca

XXXXXX

XXXXXXXXXXXX

»iramainerciu

Constituents9

Li

XXXXXX

XXXXXXXXXXXXXXXXXX

Mg

X

XX

XXXXXXXXX

XXXX

Na

X

XXXXXXXXXX

XX

vanam

so4

XXXXXX

XXXXXX

X

s aciccuu

TAKC

XXXXXX

X

XXXXXXXXXXX

o mcinuu*

Zn pH

X

X

XX

XXXXX

Functionvalue"*

5.405.435.525.535.545.61

1.091.331.351.401.401.400.2320.2350.2360.2460.2SS0.2580.2040.2280.24S0.2450.2480.249

"All observed constituent values except those for Ca and pH have been transformed.6Specific conductance.'Total alkalinity.^Function values are Wilks A (X 10 2 ) for DISCRIM, modified Wilks A (X 10 8 ) for modified

DISCRIM, and estimated probability of misclassification for forward and backward selectionprocedures.

misclassification probabilities also revealed an over-lapping of the three Permian units PGEB, PGWC, andFOQE. This overlap suggests that the available geo-chemical data cannot be used to distinguish samples inthese three groups. Combination of these groups sim-plifies the analysis and increases the sample sizes, im-proving estimation. This improvement could be evi-dence for possibly considering these groups as a singlepopulation rather than three different populations. Infact, the overall estimated probability of misclassifica-tion when these three groups were combined wasreduced from 24 to 6%, using the seven regional vari-ables.

The distribution of D2(X; p, 2) , where X is an arbi-trary observation vector, is x | (ref. 21, Theorem 33.3)if n and 2 are known. It will be assumed that the largesample sizes enable the sample estimate x (of n) and S(of 2 ) to be considered as the known quantities fi and2 . The presence of unusual geochemical samples will be

determined by examining the fit of Z)2(X; x, S) to aX2-distribution. Standard Q-Q plots (ref.22,pp. 198—99) aTe used to evaluate the distributional fitand determine the D2 threshold (i.e., the point wherethe Q-Q plot becomes nonlinear) for unusual values. Ifthe plotted values are on a line of slope 1, only a singlepopulation is present; if more than a single line appears,several geochemical populations may be present. Alter-natively, nonlinearity in the Q-Q plot could representnonnormality of X or poor estimates of ji and 2 .Samples with values of D2 above the threshold will begeographically plotted. Geochemical subpopulationswill be identified by a contiguous group of samples withunusual/)2 values..

Figure 4 shows four points that have equal values ofZ>2(X; x, S) where x = (xu, XAS)' andS is the observedcovariance matrix. In fact, any point on the ellipse hasthe same D2 value. However, the simple Euclidean dis-tance between the four points is not the same. Figure 4

Page 34: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

27

TaMe2. Generalized iq»areddittaocei forthree differed (even-variable models

Best (even-variable model fromDISCHM[/)2(xi;></,S)]

From/To TPO POQW POQE PGEB PGWC

TPOPOQWPOQE

PGEBPGWC

014424140

140

141614

421403.40.8S

41163.404.8

Beit (even-variable model frommodified D1SCSUM [D2 (if,;

From/To

TPOPOQWPOQEPGEBPGWC

TPO

012

270340240

POQW

160

9616086

POQE

730180

01.90.87

%.«/)]

PGEB

95192.900.17

40140.854.80

PGWC

47113.63.00

Best (even-variable model frombackward procedure [02(x, ;x, ,S,)]

From/To TPO POQW POQE PGEB PGWC

TPOPOQWPOQEPGEBPGWC

017816378

130

212323

813302.61.6

57111.203.3

359.51.34.60

LOW MOOERATEI ELEVATED)

u l u l uHIGH

U

Fig. 4. Four point* equidistant from (xv,ing uranium concentration*.

having vary-

illustrates how the positive correlation between uraniumand arsenic would alter what might be considered anunusual sample. In general, because of the geochemicalinterrelationships in nature, it is meaningful to use theintervariable correlations to weight the observed devia-tions from the mean in defining anomalous samples.Unusual samples defined by this procedure in somecases may be relevant to uranium mineralization. How-ever, it is important to note that samples with very lowor moderate uranium values may have unusually highD2 values (Fig. 4). Although these samples may bemeaningful in detailed analyses, attention is primarilyrestricted here to samples with uranium values abovethe median.

Regional Subpopulations

The selection of seven regional variables that enablediscrimination in the Ogallala formation and Permianunits suggests that these variables in some way charac-terize the regional geochemistry. Figures 5(a) and 5(c)

Page 35: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

28

(*)

WO 130 200 2 9 0 3 0 0 300 400 410 900 500

(c) id)

Fig. 5 . Q-Q plots using standard and robust estimates of the covariance matrix of the regional variables: (p) Ogallala fonnation,standard covariance matrix; (ft) Ogallala fonnation, robust covariance matrix; (c) modified Permian unit, standard covaiiance matrix;fcf) modified Pennian unit, robust cavariance matrix.

Page 36: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

29

show the Q-Q plots forf)2(x,-; x ,S ) in the Ogallala andmodified Permian units. The main body of the graph(D2 < 12) in Fig. 5(a) is reasonably linear, suggestingthat the expected \2 distributional fit is appropriate.However, 48 samples (14%) have values of D2 above thethreshold (D2 > 12), suggesting some lack of fit in thetail of the x2 distribution. Similarly, in Fig. 5(c), 71samples (16%) in the modified Permian units havevalues of D2 above the threshold, showing a similar lackof fit.

The lack of fit in the regional variables may be theresult of outlying samples causing poor estimates of juand 2 . Figure 5(£) shows for the Ogallala formationthat using a Huber [20] robust estimator for n and 2improves the fit somewhat for the low D2 values butaccentuates the lack of fit for the large D2 values. Fig-ure 5(rf) shows the same general characteristic for themodified Permian samples. A possible interpretation ofthe accentuation of the nonlinearity is that the robustestimates minimize the influence of atypical samples inthe estimates of ju and 2 , resulting in even more unusualD2 values for the atypical samples.

Figure 6 is a geographic plot of the samples withD2 > 12 in the Ogallala formation. Two areas IA and IBstand out as being somewhat contiguous regions withunusual D2 values. Region IA is an area with very lowconcentrations in many elements. Region IB consists ofonly seven samples, but these samples are very unusual,having extremely large D2 values. Both regions wereidentified by using an alternate set of variables (specificconductance, B, Ba, Li, Mg, Na, and total alkalinity)selected by another discriminant analysis variable selec-tion method. Large subpopulations may influence esti-mates of the main population parameters that are usedin the remaining analyses. Thus, samples in Region IAwere deleted from the subsequent Ogalalla analyses; theremaining geographic area is called the modifiedOgallala (MTPO).

Figure 7 is a geographic plot of samples havingD2 > 12 in the modified Permian units. No large con-tiguous group of samples is apparent, although severalsmall unusual groups appear in the lower part of Fig. 7.Also, many unusual samples appear along the White-horse group, Blaine formation contact.

Atypical Uranium Subpopulations

Consider the uranium-related pathfinder variables U,As, Mo, and V as characterizing sandstone uraniummineralization geochemistry. Hypothetically, if thereare areas having potential interest for uranium explora-tion within a geographic area, there will be at least two

uranium populations (and two lines on the Q-Q plot).One population having the smaller D2 values wouldrepresent the uranium geochemistry of the backgroundpopulation. A second subpopulation with larger D2

values represents uranium-related values differing fromthe background population; these samples may be ofinterest in exploration.

It is necessary to estimate the element:, of 2 by usingthe pairwise noncensored data for the urfjnium-relatedvariables because much censoring results from the verylow concentrations. One-half the laboratory detectionlimit is used for censored values to compute sample D2

values. Both of these procedures may cause non-x2

variation to be exhibited in the Q-Q plot.Figure 8(a) shows the Q-Q plot for the Ogallala for-

mation. The x 2 distribution appears to fit well for0 <D2 < 5, but at D2 & 5 there appears to be a breakin the plot. The frequency distribution of D2 inFig. 8(6) also exhibits a separate population of large D2

values. Figure 9 is a geographic plot where the sampleswith D2>5 are noted. An "H" indicates high-uraniumsamples (>80th percentile), an "E" indicates elevated-uranium samples (50th to 80th percentile), and an Vindicates moderate to low uranium. For the H and Esamples the D2 value is displayed to the right of theplotted letter. Three contiguous regions of unusual sam-ples are indicated (IIA, IIB, and IIC). Discussion of eachregion follows the Method III analysis.

The Q-Q frequency plots in Figs. 8(c) and 8(rf) showthat the modified Permian units exhibit unusual ura-nium geochemistry in that three populations appear tobe present. Figure 10 displays samples where D2 > 15,which is the most extreme of the three populations. The"M" indicates moderate-uranium samples (20- to 50thpercentile), and an "L" indicates low-uranium samples(<20th percentile). Two three-sample areas having lowuranium are indicated (IID, HE). In Fig. 10, a third area(IIF) having high uranium is indicated and was deter-mined by plots of the second population with5 < Z > 2 < 1 5 . Discussion of these regions follows theMethod III analysis.

A Priori Uranium Populations

In most exploration applications it is of interest toanalyze unknown areas by making an analogy to knownareas of mineralization. Analysis by analogy is a com-mon geologic tool, but it is often subjective in nature. Itwould be desirable to use the interelement relationshipsin Z from a known mineralized region to identify sam-ples that exhibit the same geochemical patterns in an

Page 37: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

102*0

3V30'

3V15'

• . M

101*45'

—*<

IB

101*30'

• * • '

•M

10

* ' '

tt> * " •

M l

*15'

• t *

titMt

lOI'O '

Mt

100*45'

M l

-

IOO*30'

:V

—j 1 1__

100* 15 lOO'O

- 4—.- : • • J — 1 1

Fig. 6. Ogallala fonnation samples having extreme D2 values for the regional variables.

Page 38: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

102*035'0

101M5- 101*30' 101*15 lOl'O 100*15 100*30

31**.5

3V15"

34*0 '

Pig. 7. Modified Penman unit samples having extreme D valueg for the iegtonal v»ri»bJei.

Page 39: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

32

(.a)

ij]Hnftl

(fi) (d)

Fig. 8. Q-Q and frequency plots using the incomplete data covariance matrix of the uranium-related variaUes: (a) Ogallalaformation, Q-Q plot; (b) Qgallala formation, frequency plot; (c) modified Pennian unit, Q-Q plot; (d) modified Pennian unit,frequency plot.

Page 40: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

102*0

H'OT

3«'ISP

3V0

ior»- ions* lOI'O •i i

I00MS-i i

IOTIV 1WO—i r 1 • • • •' T '

Fig. 9. Modified OgaUala fonnation samples having extteme D values for the uranium-related variables.

Page 41: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

102

3V4S'

3V30'

3V15'

3V0

0 101M5'

1 •

IOf3O

• '

ions1

101*0 -

*

tM

• ;

100'IS

i a

; • • ; • " ' •

•M

100*30'

m ,

1 • • » " " • '

* ' i«

Ul

•VI • * •

, l i

r ! u l

100*15-

no

1

HE

t i

u l

100*0

' iff

HF,

. | _J.

If * M m IM m m

Fig. 10. Modified Peimian unit samples having extreme D2 values for the uranium-related variables.

Page 42: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

35

unknown area. However, an unknown area could be ex-pected to have both different concentration levels andvariability than the mineralized region. These differ-ences could be caused by variation in the strength of thegeochemical signal, which could depend upon thedeposit depth and size in addition to groundwater flowpatterns. Only the interelement relationships in an un-known area would hopefully remain similar to those inthe mineralized area. Using generalized distance mea-sures, an approach satisfying the above criteria is givenbelow and illustrated on the Plainview data.

Let £ 4 and 2 # denote the covariance matrices forthe anomalous and background populations, p i e esti-mate of the background covariance matrix is £ # =SB,where Sg - (sjf), the sample covariance matrix of thebackground population. Now assume that the sampleinterelement correlation matrix, RA , is available from aknown anomalous area (or RA could be from a hy-pothesized geochemical model). Separate geochemicalstudies in known mineralized areas could be used toestimate RA • To adapt RA to the unknown area, let"LA ~ QRAQ, where Q = d iagonal^ . Thus, £ ^ re-flects the interelement correlations of the anomalousregion and the expected variation in the backgroundregion. The matrix 2 ^ is an a priori estimate of anoma-lous covariances in the background region.

It is possible to use 2 ^ , 2g, and UB > where £ = x, thesample mean in the background region, to identify sam-ples that have covariance patterns more similar to S^than to tB. Consider the difference

;fa, ±B,

•*»•

If the distance from X to fig weighted by Z^ isgreater than the distance from X to £# weighted by2 ^ l , then X is more likely to be from the anomalouspopulation. Figure 11 illustrates an example where theuranium/arsenic correlation is 0.2 in the backgroundpopulation and 0.6 in the anomalous population. Theshaded area represents values of uranium and arsenicwhich would yield G2 > 0 . Notice that in Fig. 11 thereis a large overlap of the two populations because thecorrelations are somewhat similar. The overlap will bereduced if the correlations are quite different or if morevariables having different correlations in the two popu-lations are used.

As an approximation to the sandstone uranium corre-lations that may be appropriate for the Plainview data,the sample correlations for observations with uranium

Fie. 11. RegkNU where G2>G.

values above the median were computed from data inthe South Texas mineralized belt in the Fleming,Catahoula, and Jackson groups from latitude 28° to29° N, longitude 97° 30' to 98° 30' W. The 84 samplesresult in the correlations shown in Table 3 part (c);

TaHe3. Sample condatioM for

uAsMoV

UAsMoV

(a) Modified Oplata

U

1.00.300.2

As

1.000.6

Mo

1.00.2

(b) Modified Pemiiaa

U

1.0-0.1-0.1

0.1

(c) South Te:

UAsMoV

u

1.00.20.S0.3

As

1.0-0.20.1

Mo

1.00.S

V

1.0

V

1.0

CM smauloiis icgmi

As

1.00.10.5

Mo

1.00.6

V

1.0

Page 43: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

36

Table 3 parts (a) and (b) are the correlations in mod-ified Ogallala and modified Permian units in the Plain-view quadrangle. There are obvious differences in thecorrelation structures of the three regions.

Figures 12(a) and 12(6) show the empirical proba-bility and frequency distribution of the G2 values in themodified Ogallala. As expected, the preponderance ofsamples have G2 < 0, that is, the fit for most samples tothe modified Ogallala background population is prefer-able to the anomalous population. A geographic plot ofthe sample having G2 > 0 (90th percentile) appears inFig. 13, which is coded in a similar manner to Fig. 9.The three contiguous regions (IIIA, IIIB, and IIIC)indicated on the plot are discussed in the next section.

The distribution of the G2 values for the modifiedPermian appear in Figs. 12(c) and I2(d). Samples havingG2 > 1 (90th percentile) are geographically plotted inFig. 14, which is coded in a manner similar to Fig. 10.The contiguous Region HID as well as other regions arediscussed in the next section.

INTERPRETATION OF RESULTS

A contiguous group of samples obtained in theMethod II or III analysis must be evaluated with respectto the actual concentration values and percentiles of thesamples. Recall from Fig. 4 that extreme D2 may beobtained from any concentration level of uranium. Theshaded area of Fig. 11 where G2 > 0 encompasses aregion within the central portion of the backgroundpopulation. Additionally, extreme/)2 orG2 values mayresult from samples having censored values that wereassumed to be one-half the censoring point in the com-putation of D2 and G2. Only when a group of con-tiguous samples displays patterns of geochemical signifi-cance should areas be considered of interest. Table 4gives the concentrations of the pathfinder elements forthe areas identified from Method II and III analyses.Specific conductance is also given to evaluate theimportance of dissolved solids. Region IIIB is a goodexample of an area that appears to be of little interest.Although the percentiles of vanadium, molybdenum,and arsenic generally correspond [a trend that matchesthe elevated correlations in Table 3 part ( c ) ] , the pat-tern of concentrations is neither consistently high norlow.

Figure IS indicates Regions IIA-IIF and IIIA—HID.Region IIA has elevated values in the pathfinder ele-ments and is supported by Region IIIA. The two areasexhibit very high D2 and G2 values. Region IIB is southof IIA and HIA and also exhibits elevated D% values,but the concentration pattern is somewhat less favor-

able than that of Region IIA. The above three areas mayresult from leakage from die Dockum formation intothe Ogallala in Area B (Fig. IS), which was found to befavorable in the Dockum for, uranium [ 1 ] . Southeast ofthese three regions are the IIC and IIIC reghiis. Noticemat IIIC encompasses IIC. The D2 values are slightlylower than those for IIA and IIB but are elevated. Addi-tionally, the pathfinder concentrations in Table 4 areelevated. The above regions (IIA, IIB, IIC, IIIA, andIIIC) are encompassed within the large area identified[1] as anomalous by factor analysis. The southernmostarea identified as anomalous by factor analysis was notfound atypical in "these analyses. It is important to esti-mate the extent of the anomalous regions as precisely aspossible because the probably mineralized area is a com-ponent in resource estimation computations.

The interpretation in the modified Permian units ofthe three possible uranium-related populations ex-hibited in Figs. 8(c) and 8(rf) is unclear. When eachgroup is geographically plotted, there does not appearto be an overall spatial relationship separating the threepopulations. However, several unusual characteristicsappear from the analyses. Region IID has extremelydepressed concentrations of the pathfinder elements.Region HE exhibits elevated molybdenum and vana-dium, whereas Region HF has elevated uranium,molybdenum, and vanadium (Table 4). These abruptchanges in uranium-related elements over a small geo-graphic area may be of interest. A ten-sample seleniumanomaly encompasses part of Regions HE and IIF andextends to the south. In contrast with the modifiedOgallala Method HI analysis, the unusual G2 samplesinclude very few high-uranium samples (Fig. 14). Manyof the samples with G2 > 0 have censored data, includ-ing those in Region HID. These censored values tend toartificially inflate the G2 values.

CONCLUSIONS

The methodology suggested here may prove useful inexploration for (1) identifying regionalized variablesthat distinguish between the geologic units in a region,(2) assigning samples of unknown origin to a geologicunit or verifying preliminary assignments, (3) identify-ing regions of unusual geochemistry for pathfinder ele-ments, and (4) associating samples with either miner-alization models or background populations. Theanalysis of the Plainview data suggests the followingsteps for accomplishing the above:

1. Perform adequate preprocessing of the data toensure reasonably distinct geologic populations andapproximate normality of the variables.

Page 44: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

37

1 £

^ • • • 1 :

(ft)

(c)

Fig. 12. Empkical probability and frequency pk>tt of G2 values for the manium-related variabtet: (p) modified Ogaltaia fotmt-tion, empirical probabiity plot; (6) modified Qgallala formation, frequency plot; (c) modifkd Pennian unit, empirical probabilityplot; 6 0 modified Pwmian unit, frequency plot.

Page 45: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

35*0 ior«- lOI'M' 10)fIS'r—

lOI'O" T •• • T

IOOMS' 100*30'1 v ' • i ' - i - — r -

tO0"l5- IOD'31 1 - ' i 1

HIA

.fflB3VW

34* 15'

M\J

Pig. 13. Modified OftUala foimation umples having potitive G2 value* for the uranium-reUted variables.

Page 46: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

i102

35*0

3V30

3V15'

o i o r « 101*30' ions-

. • ! » • • •

101*0

' I l.t

I00M5"

a Li

• I.I . '

t " * ">

L I.I

' L1.I

' ' ' •

t it'

100*30-

1

I I I

1 1 1

• I.I

' • •

• l.t ,

• * • •

1 . * • * •

I0OMS

. . . •

U l

IQD

ft. II.I

. . . .

1 1 . 1

:ac• *.• .

• «I.I

• i.» .

>.%.% " i t

C l . l .

in ia

Fig. 14. Modified Permian unit lamplet having positive G valuei foi the uianium-ielated variables.

Page 47: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

40

Table 4. Concentration levels and population percentilet for simples having atypicaluranium geochemistry

Region

HA

HB

IIC

IID

HE

IIF

MIA

1MB

me

SampleNo.

1111111114*111261112811156*

9362936410900*1090211151

11159*11616*11619*11620*

117351174411745

115691170511737

117381174111743

11109111121111311114''1112311I56C

10751115771158511586115891159211593

1076611159''11616C

11619''11620c

1162111633

D*

139101721

111572813

8858

232522

422326

9.2129.2

1.10.11.82.30.92.3

1.32.94.00.20.40.00.5

0.50.10.81.50.30.72.2

Concentration (ppb)

U

11128.7388.6

7.51015207.0

19139.311

2.41.52.8

<0.20.380.35

224019

117.712129.58.6

8.91417107.1119.6

1219139.3111411

As

7.0115.86.03.2

8159.17.92.8

5.65.94.114

<0.5<0.5<0.5

<0.50.7

<0.5

<0.5<0.5<0.5

7.56.54.4113.13.2

1.51.42.25.72.34.87.2

7.75.65.94.1145.73.8

Mo

4423<41959

<4<421720

11282220

<4<4<4

62233

102722

151414231159

5713107914

12112822201216

V

3973125064

2624462704

54323740

<4<4<4

12>437

62611

544436731664

10141119192639

24543237404930

Sp

9401000830860920

1000970740690860

85011001000820

190015002000

600052007300

370067005000

85093087010401200920

700770910780840750770

75085011001000820900680

D2

90859095

>95

909075

>9590

80807080

>95>95>95

>95>95>95

558055

908595

>9590

>95

95>95>9585858590

908590

>958590

>95

Population percentile

V

959580

>9580

5085

>95>9550

>95>958595

555

<555

90>9585

906595958580

80>95>9585609085

95>95>958595

>9S95

As

85>95808025

90>9S959015

758045

>9S

<20<20<20

<2025

<20

<20<20<20

908555

>952525

551575156090

95758045

>957540

Mo

>95>95<1095

>95

<10<10>9520

>95

85>9S>9S>95

<20<20<20

3590

>95

659590

959595

>9575

>95

15358570356590

8585

>95>95>958595

V

>95>9530

>95>95

555595

>955

>959095

>95

<20<20<20

60<20>95

309060

>959595

>9S35

>95

15251555557595

65>959095

>95>9585

Sp*

8595657085

9090403070

70959065

10510

908595

409075

75907595

>9585

30458550704545

40709590658025

"Specific conductance in jimhos/cm.*Sample is found similar to uranium-related population from Method III analysis.^Sample is found atypical of uranium-related population from Method II analysis.

L

Page 48: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

ANOMALOUSFACTOR SCORES(AMARAL.I979)

AREA BAMARAL..I979)

yHE

3V0

Fig. IS. Summaiy map from multivwute dittance analyses.

Page 49: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

42

2. Compute the Mahalanobis distance between allpopulations and combine those that are close asjudged by small distances (large theoretical misclassi-fication probabilities).

3. Use appropriate variable selection methods (e.g.,Modified DISCRIM) to identify the variable sets thatare candidates for the regional variables; select theregional variables from the candidate subsets basedon the geochemistry of the region.

4. Identify regional subpopulations for separate anal-ysis from samples having extreme D2 values for theregional variables.

5. Similarly, identify unusual regions possibly im-portant to exploration from samples having extremeD2 values for the mineralization pathfinder ele-ments.

6. Identify contiguous groups of samples that are as-sociated with known mineralized populations ratherthan the background population (i.e., G2 > 0).

7. Evaluate concentration patterns of the pathfinderelements used in (S) and (6) to determine possibleareas of interest to exploration.

Application of the above methodology to the Plain-view quadrangle ground water data indicated areas thatwere consistent with previous analyses and other newareas of unusual uranium geochemistry.

ACKNOWLEDGMENTS

This work was supported in part by the Oak RidgeGaseous Diffusion Plant NURE Program. The authorsgreatly appreciate geologic input received from T. R.Butz of the NURE Program.

REFERENCES

1. E. J. Amaral, NURE PMnview Quadrangle, Texas, GJQ-001, Bendix Field Engineering Corporation, GrandJunction Operations, Grand Junction, Colo. (March 1979).

2. J. W. Arendt, T. R. Butz, G. W. Cagle, V. E. Kane, and C. E. Nichols, Hydrogeochemical and Stream Sediment,Reconnaissance Procedures of the Uranium Resource Evaluation Project, K/UR-100, Oak Ridge Gaseous Diffu-sion Plant, Oak Ridge, Tenn. (1979).

3. Uranium Resource Evaluation Project, Procedures Manual for Groundwater Reconnaissance Sampling, K/UR-12, Oak Ridge Gaseous Diffusion Plant, Oak Ridge, Tenn. (March 1978); [GJBX-62(78)], U. S. Department ofEnergy, Grand Junction, Colo.

4. Uranium Resource Evaluation Project, Hydrogeochemical and Stream Sediment Reconnaissance Basic Data forPlainview NTMS Quadrangle, Texas, K/UR-101, Oak Ridge Gaseous Diffusion Plant, Oak Ridge, Tenn.,(Dec. 29,1978); [GJBX-92(78)], U. S. Department of Energy, Grand Junction, Colo.

5. A. J. Sinclair, Applications of Probability Graphs in Mineral Exploration, Special Vol.4, The Association ofExploration Geochemists, Richmond Printers, Richmond, Canada, 1976.

6. G. W. Snedecor and W. G. Cochran, Statistical Methods, pp. 86-88, Iowa State University Press, Ames, Iowa,1967.

7. S. S. Shapiro and M. B. V/2k,Biometrika 52,591-611 (1965).8. M. A. Stephens,/. Am. Stat. Assoc. 69,730-37 (1974).9. F. P. Agterberg, Geomathematics, Elsevier Scientific Publishing Co., New York, 1974.

10. P. A. Lachenbruch, Discriminant Analysis, Hafner, New York, 1975.11. P. A. Lachenbruch and M. Goldstein, Biometrics 35,69-85 (1979).12. C.R.Rao, Linear Statistical Inference and its Applications, pp. 435-513, Wiley, New York, 1965.13. G. P. McCabe, Jr., Technometrics 17,103-9 (1975).14. W. J. Dixon and M. B. Brown, Biomedical Computer Programs P-Series, BMDP-79, University of California

Press, Berkeley, Calif., 1979.15. M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, Vol. 3, pp. 264-84, C. Griffin and Company,

Ltd., London, 1966.16. A. J. Ban, J. H. Goodnight, J. P. Sail, and J. T. Helwig,,4 User's Guide to SAS76, SAS Institute, Inc., Raleigh,

N. C,1976.17. J. D. F. Habbema and J. Hermans, Technometrics 19,487-93 (1977).18. R. H. Randies, J. D. Broffitt, J. S. Ramberg, and R. V. Hogg,/. Am. Stat. Assoc. 73,379-84 (1978).19. R. H. Randies, J. D. Broffitt, J. S. Ramberg, and R. V. Hogg,/. Am. Stat. Assoc. 73,564-68 (1978).20. P. J. Huber, "Robust Covariances," in Statistical Decision Theory and Related Topics II, Academic Press, New

York, 1977.

Page 50: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

43

21. T.W. Anderson, An Introduction to Multivariate Statistical Analysis, Wiley, New York, 1958.22. R. Gnanadesikzn, Methods for Statistical Data Analysis ofMultivariate Observations, Wiley, New York, 1977.

Page 51: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Sensitivity Study on the Parameters of the Regional Hydrology Modelfor the Nevada Nuclear Waste Storage Investigations

R. L. Iman and H. P. Stephens

Sandia LaboratoriesAlbuquerque, New Mexico

/. M. Davenport

Texas Tech UniversityLubbock, Texas

R. K. Waddell and D. I. Leap

U.S. Geological SurveyDenver, Colorado

ABSTRACT

In support of the Department of Energy's Nevada Nuclear Waste Storage Investigations, statisticalmethodology has been applied to an investigation of the regional hydrologic systems of a large areaencompassing the Nevada Test Site (NTS) as a part of the overall evaluation of the NTS for deepgeologic disposal of nuclear waste. Statistical techniques, including Latin hypercube sampling, wereused to perform a sensitivity analysis on a two-dimensional finite-element model of the regional flowsystem, which was subdivided into 16 geohydrologic zones based upon a conceptual model of the flowsystem. Because some input variables in this model are correlated, the Latin hypercube sample wasmodified to include correlations between corresponding variables from zone to zone. Sensitivityanalysis of the input-output relationship of the model is being used to (1) determine which inputvariables have a significant effect on model output; (2) define the relative importance of these inputvariables; and (3) ascertain the effect of input variable correlations on their rankings.

Results of the sensitivity analysis agree with the conceptual model of the hydrologic system, andmore importantly, have led to its refinement. Analysis using ranked, rather than unranked, variables ismore effective in determining which input variables have important effects on model output. Use ofthe modified Latin hypercube sample had a significant effect on the ranking of correlated variablesand provided better agreement with the conceptual model.

INTRODUCTION radionuclides away from a nuclear waste repository bygroundwater is a potential pathway to the biosphere,

The primary objective of the Nevada Nuclear Waste quantitative prediction of concentrations and rates ofStorage Investigations (NNWSI), funded by the Depart- movement of radionuclides is essential to evaluation ofment of Energy's National Terminal Waste Storage the NTS. An effort is currently in progress to develop aProgram, is the evaluation of areas within the Nevada solute transport model of a large area encompassing theTest Site (NTS) for their potential as sites for deep NTS to predict radionuclide transport in groundwater.geologic disposal of commercially generated spent fuel An initial element of the transport model developmentor high-level radioactive waste. Because transport of is refinement of the two-dimensional regional flow

44

L

Page 52: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

45

model of the area. This paper describes statisticalmethodology being used to refine the model of regionalhydrologic systems encompassing the NTS.

Because the hydrologic flow model is complex andinvolves use of large sets of input parameters, some ofwhich are correlated, it is difficult to discern thesensitivity of model output to variability in inputparameters. The objective of this effort is to perform asensitivity analysis of the flow model to (1) determinewhich variables have a significant effect on modeloutput, (2) define the relative importance of thesevariables, and (3) ascertain the effect of input variablecorrelation on their rankings. Statistical techniques,including Latin hypercube sampling, partial rank corre-lation, rank regression, and predicted error sum ofsquares, have been used to perform this sensitivityanalysis. A unique aspect of the study was modificationof the Latin hypercube sample to provide for depen-dences among input variables.

The approach to the sensitivity analysis is a modifica-tion of that outlined in detail in Iman, Helton, andCampbell [1] and includes the following steps:

1. Estimate ranges and frequency distributions of inputvariables based on available data.

2. Use Latin hypercube sampling techniques based onthese distributions to generate specific values ofinput variables for the simulation model. Briefly, aLatin hypercube sample (LHS) is formed by dividingthe range cf values (mean ± 3 standard deviations) ofeach variable into n equal probability intervals andrandomly selecting a value from each interval. Thisgenerates a sample of n observations of each variablehaving the desired distribution. The sample valuesgenerated in this manner are then randomly mixedfor each variable and randomly paired with values ofthe other input variables to define an input vector,making sure that each value of each variable is usedexactly once. For this study, a value of 100 waschosen forn.

3. Modify the Latin hypercube sample to reflectcorrelations existing among some of the inputvariables. The procedure for doing this is discussed inthe appendix.

4. Run the simulation code for each of the n sets of"observations" obtained from the modified LHS andcheck the n outputs for reasonableness.

5. Determine relative importance of the input variablesusing stepwise regression on both raw (untrans-formed) and rank-transformed variables. The step-wise regression is a linear regression procedure thatbuilds a model by adding to the model the indepen-dent variable with the highest simple correlation (for

the first step) or partial correlation (for later steps)until all statistically significant variables have beenadded. After each step, the significance level of allvariables in the model is checked; if one is less thanthe required level (a = 0.05), that variable is re-moved.

If the relationships between input and outputvariables are monotonic but nonlinear, replacing thevariable by their ranks will linearize their relation-ship and reduce the effects of extreme values.Monotonic relationships existed between input andoutput variables in this study, and rank transforma-tions were therefore applied. This process helpsalleviate the problem of trying to fit the raw datawith a linear regression or of trying to formulate anapproximate nonlinear model in the presence ofmany potentially important variables.

Because the statistical methodology is being applied atan early stage of development of the solute transportmodel, the results described are preliminary and for theflow model only. Whereas sensitivity analysis of thesolute transport model may utilize radionuclide concen-trations as output variables, this study utilized thehydraulic gradient at various geographic locations.These preliminary results are presented to demonstratesensitivity analysis techniques and the effect of in-cluding dependences among input variables. Futureeffort toward refining the flow model will be devotedto: (1) quantifying the effects of hydrologic anisotropy;(2) providing guidance for the optimum geographicareas to collect additional field data; and (3) under-standing which parameters dominate the modeling oftransport of radionuclides by groundwater.

DESCRIPTION OF THE REGION

The region modeled (Fig. 1) is located primarily insouthern Nevada, within the Basin and Range Province.Elevations range from near sea level (Furnace CreekRanch in Death Valley) to almost 3600 m at the top ofMount Charleston in the Spring Mountains. Precipita-tion, and therefore recharge to the groundwater system,is greater at higher elevations, such as the SpringMountains, Sheep Range, and Pahute Mesa. Largesprings discharge at Furnace Creek Ranch, Ash Mead-ows, and Oasis Valley. Discharge through evapotranspi-ration occurs at Alkalai Flat.

The stratigraphy and structure of the area are quitecomplex, reflecting extensive Precambrian and Paleo-zoic sedimentation (combined thickness of resultingrocks is greater than 11,000 m), Mesozoic folding andthrust faulting, and Cenozoic volcanism and block

Page 53: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

46

II70 00'

-37° 45'

\

\>

VA

LL

EY

^^^-^

sCREEK / /RANCH^ V

\

-36°05'1J7°OO'

J

PAHUTE MESA

\

\

}•11

JfK ^

> \

30 MILES

, 30 KM

/?\NEVADA / / \ ^ J

«ST / U \ -S l " / \ ^ \

ALKALA/\ /FLAC ^X4*.

1 3 1

/

y

\/V/

\\\

\

\N

U l

11^

1I5°OO'

37°45-

i_» LAStP VEGAS

36*05'-115° 00'

Fig. 1. Diagram of the study area. Numbers designate zones; letters designate locations where simulated hydraulic gradients areanalyzed.

faulting. Four major rock types occur in the area. Theoldest tocks exposed in the area are slightly meta-morphosed, low-permeability sandstones and shalesranging in age from Precambrian to Cambrian. Overlyingthese rocks are high-permeability limestones and dolo-mites, which are found in the eastern part of the areaand range in age from Cambrian to Devonian. Winograd

and Thordarson [2] called these limestones and dolo-mites the lower carbonate aquifer. The Eleana Forma-tion, a low-permeability, metamorphosed shale, overliesthe lower carbonate aquifer and is of Devonian toMississippian age. Minor carbonate units of little hydro-logic significance are next in the sequence. These areoverlain by great thicknesses of Tertiary tuffs and

i.

Page 54: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

47

rhyolites originating from eruption centers in thewestern part of the modeled area. These volcanic rocksare generally less permeable than the rocks comprisingthe lower carbonate aquifer, but may locally be highlypermeable because of fractures.

The most evident structural features are the numerousbasins and ranges formed by normal faulting. Thesetopographic features and the associated faults have ageneral north-south orientation. They significantly af-fect groundwater flow paths because they juxtaposeblocks of high and low permeability, forming barriers,and cause tilting of bedding resulting in anisotropictransmissivity. Of lesser importance to regional hydrol-ogy, but possibly of great importance to local hydrol-ogy, are thrust faults, folds, and local structuresassociated with the eruption centers.

Paleozoic carbonate rocks, Tertiary volcanic rocks,and Tertiary and Quaternary alluvium are the threemajor aquifers in the area. In the eastern part, rechargeoccurs in the Spring Mountains and the Sheep Range.The water flows through the carbonate rocks; most of itdischarges from springs at Ash Meadows. In the westernpart, most of the recharge occurs on Pahute Mesa. Afterflowing through volcanic rocks, the water either appearsas spring discharge in Oasis Valley, flows into thealluvium underlying the Amargosa Desert west of AshMeadows, or flows through the Eleana argillite into thetuffs and carbonate rocks underlying Yucca Flat. Waterflowing beneath the Amargosa Desert is a mixture ofwater from these various sources; it discharges at AlkalaiFlat, or in the vicinity of Furnace Creek Ranch.

GROUNDWATER FLOW MODEL

An equation describing steady-state, two-dimensionalflow of groundwater in a porous medium is

r = 1,2,

where Tjj is the transmissivity tensor;/? is the hydraulicconductivity of a confining layer above or below theaquifer; H is the head on the distal side of the confininglayer; Q is the source-sink strength (positive for asource); h is the hydraulic head in the aquifer; and x isthe Cartesian coordinate [3]. The Einstein summationconvention is assumed. Tjj is a symmetric tensor thatmay be reduced to two principal components, T\ andT2, by rotation of the coordinate system so it is alignedwith the principal transmissivity directions. T\, definedto be the greater of the pair of principal directions, isperpendicular to 7*2, and is in the direction 6 measured

counter-clockwise from north. For this study, R isassumed to be zero. The flow equation thereforereduces to

This equation was solved for the geohydrologicconditions of the modeled region, using a modifiedversion of a parameter estimation code which contains amethod-of-weighted-residuats formulation of theGalerkin scheme [3,4]. The model of the flow systemcontains 765 elements and 837 nodes. The region wasdivided into 16 zones such that each zone is assumed tobe homogeneous and isotropic with respect to trans-missivity and orientation of the transmissivity ellipse.The model is flexible enough so that recharge anddischarge may be applied to all or only part of a zone.For example, recharge for Zone 4 was applied only tothe Sheep Range and not to the entire zone.

INPUT PARAMETERS AND FREQUENCYDISTRIBUTIONS

Simulation of the regional hydrology requires thatvalues for T\, 72, 6, and Q be provided for each zone.Estimates of the means, standard deviations, and typesof distribution are given in Table 1. These estimates arebased on various hydrologic and parameter estimationstudies, as discussed below.

Available data include measurements of (1) dischargeat Ash Meadows [5,6] , Furnace Creek Ranch [7], andOasis Valley [8]; (2) transmissivity at several locationsin and near NTS [2,9]; and (3) precipitation atnumerous sites within the study area. Estimates areavailable for discharge by evapotranspiration for OasisValley [8] and Alkalai Flats [6]. In addition, measure-ments of depth to water are available for numerouswells throughout the area [10]. These data were used asinputs for parameter estimation procedures [3] toevaluate the recharge and regional transmissivities (as-suming isotropy) for the region, which, with measureddischarges, provided estimates of the mean values of thedistributions used in the Latin hypercube scheme. It isassumed that within most of the region, the direction offlow is approximately parallel to the structural grain.Therefore, transmissivities calculated by the parameterestimation procedure are assumed to represent 7*1.Conservative estimates of the standard deviations of T\were used for those zones for which data were notavailable.

Page 55: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

48

Table 1. Estimated meant and standard deviations for the four input variables listed acioss zones

Each input variable is assumed to have a normal distribution

Zone

i

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

-2

-2

-4

-2

-4

-2

-2

-2

-2

-4,

-4,

-3.

-2,

-4.

-2.

-3.

log1

X

.92

.92

.28

.01

.28

.52

.92

.01

.01

.28

.28

,01

,01

,28

,01

,54

o T l l /

s

0.400

.400

.800

.400

.800

.600

.400

.400

.400

.800

.800

.400

.400

.800

.400

.600

1

1

1

1

1

1

1,

1,

1,

1.

1.

1.

fllog

X

.00

.00

.00

.30

.80

.20

.00

.60

.30

,00

.00

,00

,30

30

30

60

10T

s

0.360

.360

.720

.360

.720

.540

.360

.360

.360

.720

.720

.360

.360

.720

.360

.540

0,

6.

9.

6.

9.

0.

6.

5.

5.

.

6.

5.

6.

5.

1.

e(radians)

X

,20

OOOE-02

35

16

000E-02

1.1

59

41

52

20

93

20

76

05

s

0.350

.520

.260

.260

.260

.260

1.570

.260

.260

9.000E-02

.260

.170

.520

.330

.350

.170

Q

X

2.671E-10

0.

3.129E-05

4.361E-09

0.

0.

-1.200E-09

-6.000E-02-2-1'

0.

0.

0.

-2.260E-05i/'

-1.600E-08

0.

4.361E-09

0.

s

1.000E-10

1.000E-12

1.000E-05

2.000E-09

1.000E-12

1.000E-12

2.000E-10

2.OOOE-02

1.000E-12

1.000E-12

1.000E-12

5.000E-06

5.000E-09

1.000E-12

2.000E-09

1.000E-12

•- Transmissivity units are m Is.

r = log1 0 (T1/T2).

- These values (m Is) are for point (e .g. , spring) discharge; other values in x column are fordistributed recharge or discharge (m/s).

Freeze [11] summarized several studies that stronglysuggest that hydraulic conductivity is log-normallydistributed. It is assumed in this study that transmis-sivity, the product of hydraulic conductivity andthickness of the medium, is also log-normally distrib-uted. Analysis of hydrologic test data for Pahute Mesa[9] suggests that this assumption is valid. Therefore,both T\ and Tl were assumed to be log-normallydistributed.

By definition, T\ is greater than Tl. The degree ofanisotropy was expressed as

A log r= log n - log n = log nm.

Because few experimental data relative to degree ofanisotropy are available, it was assumed for this study

that A log T is normally distributed, as the aboverelation requires. Although ratios of vertical to hori-zontal hydraulic conductivity may be quite high (10 : 1to 100: 1), ratios of horizontal components of con-ductivities may be much smaller (1 : 1 to 10 : 1).Estimates of the ratio were made for each zone basedon the "strength" of the structural grain of each zone.In other words, if the predominant structural feature ina zone is normal (block) faulting, with accompanyingtilting of beds and fracturing, the degree of anisotropyis probably high. If a zone is structurally complex,without a dominant structural grain, the degree ofanisotropy is probably low. The transmissivities Tl andT2 are correlated because they are components of thesame property given in two orthogonal directions. Thestandard deviation for the degree of anisotropy was

Page 56: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

49

calculated from standard deviations estimated for 7*1and 72, an assumed correlation between T\ and 72 of0.6, and the expression for the variance of the summa-tion of two random variables. The estimates of A log Tand n were used to generate values of 72 used in thesimulations.

Orientation of the transmissivity ellipse was estimatedfrom the geologic structure, with the ellipse orientedwith its long axis parallel to the structural grain, and isexpressed in radians counter-clockwise from north.Estimates of the standard deviation are more conserva-tive for zones of greater complexity. Orientation of theellipse is determined by the strike of bedding wherebedding is tilted, and the strike of fracturing wherebedding is nearly horizontal. The zones are chosen to beas homogeneous as possible with respect to strike ofbedding or strike of dominant fractures. Distributionsof strikes of bedding or of dominant fractures, mappedfrom LANDSAT images, are symmetrical and assumedto be normally distributed.

Estimates of standard deviation for the recharge/discharge coefficients, Q, were based on the magnitudeof the mean. Greater uncertainty was assigned to thosemeans determined from parameter estimation proce-dures than those determined from field measurements.The Q's that are important to deep, regional ground-water systems are long-term averages of recharge ordischarge. Even though short-term precipitation (storm)events and resulting runoff are approximately log-normally distributed, long-term averages of manyevents, by the central limit theorem, should be normallydistributed. Because of conservation of mass con-straints, Q's are correlated from zone to zone. Anincrease in recharge (positive Q) in one zone causes anincrease in discharge (negative Q) in another zone. Thecorrelations between Q's were estimated from knowl-edge of the physics of flow and the hydrology of theregion, but not from data. Correlation estimates aregiven in Table 2.

MODIFICATION OF THE LATIN HYPERCUBESAMPLE TO INCORPORATE CORRELATION

AMONG INPUT VARIABLES

Correlations among the recharge-discharge rates fromzone to zone must be taken into consideration whenmodeling groundwater flow. Ths LHS scheme forselection of input values into computer codes has manydesirable properties (outlined in [12]); however, theLHS scheme assumes input variables to be statisticallyindependent. Therefore, it is necessary to modify theLHS to allow a prespecified correlation structure to

exist among input variables. The outline of this ap-proach follows.

Suppose M input variables to be sampled are denotedby Xx, X2, . . . , XM- Furthermore, suppose thatp (p <M) of these variables are assumed to be corre-lated, and the remaining M-p variables form amutually independent set of variables. Denote thecorrelated variables by X(1, X{2, A / 3 , . . . , X\p anddefine the vector Xj = (X/,> Xi2, X{3, . . . , Jff ).Because the LHS scheme assumes that all the variablesin X are mutually independent, the problem is to takesubset X,-, and transform it so the members will have aprespecitied correlation structure. Let the correlationstructure for the p variables in X,- be given by thefollowing p x p , symmetric, positive definite, correla-tion matrix:

/> =

1 Pl2 Pl3 • • • Pip

P21 1 P23 • •• P2p

P31 P32 1 • • • Pip

Pp\ Pp2 Pp3 1

where-1 < p / / < 1.If the vector X,- is assumed to have a p-variate normal

distribution with mean fi and diagonal variance-co-variance matrix V, then the desired transformation thatwill modify the LHS to produce the correlated variablesis given in the following theorem.

Theorem. Let X2- ~ Np(n, V) as defined in the abovestatements. Let P be a p x p symmetric, positivedefinite, correlation matrix. Then there exists a p x pmatrix A, such that Y ,= 4 ( X , - n)+ n and each^in J~ 1.2, • • . , p will be normally distributed andhave mean ju/. and variance a}, (the same mean andvariance as AT,-.); and, furthermore, Y,- will have correla-tion structure, P.

Proof of this theorem and derivation of matrix A isgiven in the Appendix.

The following comments relate to obtaining corre-lated variables from a LHS:1. The above procedure is valid only when P is a

positive definite correlation matrix and X{- has amultivariate normal distribution.

2. The above approach destroys the property inherentin the LHS of having a value associated with each ofthe equal probability intervals defined over the rangeof each input variable.

Page 57: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

50

Table 2. Intuitive estimate of the correlation structure for Qj among 16 zones

Zone

1 1.0

2 .9 1.0

3 .6 .5 1.0

4 .7 .7 .5 1.0

5 .8 .8 .5 .6 1.0

6 .5 .5 .3 .6 .6 1.0

7 -.9 -.8 -.5 -.4 -.7 -.4 i.O

8 -.7 -.6 -.4 -.6 -.7 -.6 .8 1.0

9 -.5 -.7 -.2 -.8 -.8 -.8 .7 .9 1.0

10 .2 .1 -.1 .1 .1 .1 -.1 -.3 -.5 1.0

11 .1 .1 .1 .1 .1 .1 -.3 -.3 -.3 .6 1.0

12 -.7 -.6 -.4 -.4 -.3 -.3 .6 .6 .6 -.1 -.8 1.0

13 -.8 -.8 -.6 -.8 -.8 -.8 .8 .9 .9 -.6 .5 .6 1.0

14 .6 .6 .4 .8 .7 .7 -.8 -.8 -.9 .6 .6 -.4 -.9 1.0

15 .6 .6 .2 .9 .7 .7 -.8 -.7 -.9 .5 .5 -.4 -.9 .9 1.0

16 -.7 -.6 -.2 -.6 -.6 -.6 .7 .9 .8 -.5 -.5 .8 .7 -.7 -.6 1.0

Zone 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

3. The above approach is not the only way to solve theproblem. For example, a solution using rank correla-tions could preserve the integrity of the original LHSand remove the normality assumption on X,-. How-ever, before using such an approach, one shouldinvestigate the effect on the properties of LHS[12,13].

As mentioned in comment (1) above, the correlationmatrix ? must be positive definite. The matrix given inTable 2, represent?* e intuitive estimates for these corre-lations, is not a positive definite matrix. Therefore, thismatrix was modified slightly to generate a positivedefinite matrix by the following procedure:

1. The matrix is decomposed into its eigenvalues andeigenvectors. Since the negative eigenvalues are near

zero, they are arbitrarily set equal to +0.05, and anadjusted correlation matrix is constructed.

2. The adjusted matrix now has diagonal values greaterthan unity. These diagonal values are set equal toone and the process repeated.

This procedure produced a positive definite matrix(Table 3) after four iterations. Element by elementcomparison of the matrices given in Tables 2 and 3shows that there is good agreement between these twomatrices.

Application of this procedure to the original LHS,generated using the distributions given in Table 1,produced a modified LHS with actual correlationsamong the g,'s given in Table 4. These correlations arevery similar to those desired (Table 3).

Page 58: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Tible 3. Adjusted comUtion mitrix (which if positive definite)

Zone

1 1.0000

2 .9971 1.0000

3 .6120 .5156 1.0000

ft .6521 .6868 .4694 1.0000

5 .7521 .7924 .4866 .6395 1.0000

6 .4775 .5096 .31)10 .6530 .6228 1.0000

7 - .8391 - . 7 7 5 5 - . 4 7 9 6 - .5155 - .7413 - .4492 1.0000

8 - .7012 - . 6 3 3 8 - . 4 0 0 8 - .6139 - .7070 - .6302 .7917 1.0000

9 - .6099 - .6704 - . 2 2 5 9 - .7636 - .7494 - .7759 .6736 .8649 1.0000

10 .1431 .1009 - . 0 9 8 2 .1658 .1448 .1561 - . 1 7 0 6 - . 3274 - .4474 1.0000

11 .1627 .1193 .1037 .1375 .0699 .1179 - . 3 2 2 3 - . 3 2 3 5 .3671 .5797 1.0000

12 - .6252 - . 5 7 7 8 - . 3 8 6 5 - . 4003 - .3515 - .3108 .6090 .5990 .5115 - .1617 - . 6 7 8 5 1.0000

13 - .7813 - . 7 6 2 0 - . 5 4 8 0 - .7739 - .7859 - .7240 .776ft .8337 .8827 - .5032 - . 4656 .5894 1.0000

14 .6152 .5895 .3896 .7333 .6986 .6652 - . 7 2 5 2 - . 7 9 1 6 - . 8838 .5721 .5269 - . 4 6 7 7 - . 9 3 5 7 1.0000

15 .5975 .6338 .2469 .7954 .6792 .6710 - . 7 0 4 9 - . 7 0 4 5 - .8884 .4754 .4446 - . 4 2 5 5 - . 8 7 9 2 .9423 1.0000

16 - .6925 - . 6 0 8 1 - . 2162 - . 5 5 4 8 - .5750 - .5614 .6695 .8641 .8130 - . 4470 - . 4 9 5 9 .7619 .7702 - . 6 9 6 1 .6465 1.0000

Zone— 1 2 3 4 5 6 7 8 9 ' 10 11 12 13 14 15 16

Page 59: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 4 . Actual correlation matrix resulting from modifying the LHS based on correlation matrix in Table 3

7.onc

1

2

3

U

5

6

7

8

9

10

11

12

13

14

15

16

1.0000

.9178

.5893

.6800

.7572

.4714

-.8371

-.6736

-.6686

.2680

.11A0

-.6306

-.8249

.6633

.6617

-.6981.

1.0000

.5249 1

.6661

.7740

.4813

-.7963 -

-.5993 -

.-.6840 -

.1504 -

.0222

-.5729 -

-.7690 -

.5708

.6391

-.6002 -

0000

5042 1.0000

5528 .6234 1.0000

2862 .6720 .6728 1.0000

4330 - .5175 - .7343 -.5043

4003 - .6010 - .6931 -.6765

2842 - .7562 - .7413 - .8131

1.0000

.7933 1.0000

.7443 .BR71 1.0000

0162 .1823 .0960

,1089 - .0233 - .0276

.1601 -.2305 -.3065 -.4066 1.0000

.0805 -.2573 -.2558 -.2526 .5465 1.0000

4027 -.3492 -.3719 -.3341 .6237 .5925

5875 -.7009 -.7703 -.7356 .8003 .8221

.5303 -.2476 -.6827

.8897 -.5009 -.3673

1.0000

.6200 1.0000

.4658 .7129 .6839 .6891 -.7456 -.789] -.8688 .5723 .4487 -.5084 -.9385 1.0000

.2752 .8032 .6336 .7102 -.7274 -.6983 -.8842 .4538 .3032 -.4414 -.8746 .9116 1.0000

.2351 -.5376 -.5405 -.5883 .7129 .8632 .8314 -.4914 -.4766 .7755 .7914 -.7194 -.6730 1.0000

Zone-- 1 7 8 10 11 12 13 14 15 16

Page 60: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

53

REGRESSION MODEL

The sensitivity analysis is based on performing astepwise regression, using the hydraulic gradients atseveral geographic locations as the dependent variable,and functions of the input parameters to the flowsimulations as the independent variables. The hydraulicgradient is of interest because it is proportional to thevelocity of groundwater flow. Terms included in theregression analysis consist of the linear and quadraticterms of all parameters (71, 72, B, and Q); the productof 71 and 72 zone by zone; and the quotient Q/Tl,using 71 for the zones close to the point of interest andQ for zones having a nonzero mean Q. The rationale forincluding these additional terms follows:

71x 72

The directional aspects of transmissivity may berepresented by an ellipse

x2

—TX

v2i—=

where Tx and Ty are principal transmissivities orientedin the x andy directions. An average transmissivity maybe defined as one satisfying the expression for a circle

x2 y2

— +-^-av Iav

such that the areas of the circle and the ellipse areequal. The resulting expression is

The interaction term 71 x 72 therefore was included inthe regression model to account for flow not in aprincipal direction.

Q/Tl

Darcy's Law, which describes the flow of fluidsthrough porous media, may be written for one-dimen-sional flow and unit cross-sectional area as

Tdh

where q is the volume rate of flow; b is the thickness ofthe aquifer; and the other terms are as definedpreviously. Rearranging,

dx

which suggests the use of the term Q/Tl in theregression analysis.

RESULTS AND DISCUSSION

The basic LHS input selection consisted of 100vectors, each of which contained values of the 64variables as previously defined. Two such samples weregenerated, and the model was run on each. The firstsample assumed independence of all input variables. Thesecond sample was constructed identical to the firstwith the exception that the Qi were multiplied by thematrix A mentioned in the above theorem, which inturn was based on the correlation matrix given inTable 3.

Results of the stepwise regression are given in Tables 5(ranked variables) and 6 (raw variables). Subscriptsassociated with the variables are used to identify thezone number from Fig. 1. Variables are listed in thesetables in the order in which they appeared in thestepwise analysis. Also included with each variableappearing in the final regression equation is the value ofits standardized regression coefficient, which can beused to determine the relative importance of theindependent variables. That is, the greater the absolutevalue of the regression coefficient, the more importantthe variable. For example, a variable with a standardizedregression coefficient of 0.8 has about twice theinfluence on the dependent variable as a variable with astandardized regression coefficient of 0.4.

When comparing certain locations, some similaritiesoccur among the variable selection indicated in Table 5.These similarities are expected since locations A and Bare both in Zone 1, and locations C, D, and E are inZone 8. The entries in Table 5 can be compared to seethe effect of correlated input variables on the determi-nation of important variables and zones. In particular,since the Qi were the only variables correlated fromzone to zone, it. is worthwhile to examine the subscriptsassociated with the Qi to determine which zones areregarded as important. These subscripts are listed inTable 7 in the order in which the corresponding Q firstappeared in the stepwise analysis.

While some agreement within Table 7 is expected,several differences are noticeable when comparisons aremade between the correlated Q, group and the uncorre-lated Qi group. For example, Zone 7 is designated as themost important zone in four of the five locations in thecorrelated group, yet fails to appear in the uncorrelatedgroup. Likewise, Zone 13 is chosen for all locationswithin the uncorrelated group, but in only one locationin the correlated group. Also noticeable is the absence

Page 61: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 5. Variables selected by stepwise regression on ranks

Step No.

1

2

3

5

6

7

8

R2:

Vnr.

« 1

~ ~ Q7

T15-T2

T18.T2

T 2 4

^ 1

%/nx

.596

A

St. Reg.Coef.

- .481

*

5 -.317

-.264

.252

.541

.246

Var.

T22

« 4

T 1 5

Tl 3

.444

V

D

St. Reg.Coef.

-.539

-.326

.261

-.238

-.188

9 correlated

Location

C

Var.

»2T 29

<;

Q12

n lT 2 6

88

"6

.806

St . Reg.Coef.

-.673

-.684

-.240

-.170

.129

-.118

-.097

.114

Vnr.

T 28

VT11T 2 9

"L*ll

.720

D

St. Reg.Coef.

- .711

.289

-.382

-.207

-.187

E

Var. " • *•«•Coef.

T22 - .791

T22 - .484

Q2 - .283

Q^/Tlj^ .176

T22 - .123

.752

Page 62: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Tables (continued)

Step No.

1

2

3

4

5

6

7

8

9

10

R2:

Vnr.

qi

«;

VT18T11

«2T13

.613

A

St. Reg.Coef.

-.357

.353

-.276

-.226

.211

-.279.

.170

-.148

1)

Var.

« ;

QI

Tl3

VT19Q15

11,-12,

TVT24

Q^/Tl,

Tl5

.612

V

St. Reg.Coef.

-.500

.321

-.661

-.239

.207

-.186

-.147

.185

-.408

-.136

a uncorrelated

Location

C

Vnr.

«2«J-4

•L

.623

St. Reg.Coef.

-.692

-.757

-.312

.146

.119

D

Vnr.

i T2>i T 213

qi3/T1l

U r 2 8

910

<

.430

St. Reg.Coef.

-.424

-.236

.177

-.793

.118

.120

E

Vnr.

« !

«92

Q13/T11

eio

.720

St. Reg.Coef.

-.825

-.554

.228

.115

-.097

•<J>

*ThIn vnrloblc vas replaced at n latter step In the atepwlse analysln as Indicated by the connecting arrow.

Page 63: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 6. Variables selected by stepwise regression on raw data

Step No.

1

2

3

4

5

6

R2:

Vnr.

VT1i

Q 1 2/T1 B

T11

.511

A

St. Keg.Coef.

.379

-.351

-.317

.445

-.384

.231

Vnr.

VT1i

of

.313

D

St. Keg.Coef.

.507

.293

Q.'a correlated

Location

Vnr.

VT19

Q12/Tlfl

T2g

"9

.597

c

St. Keg.Cocf.

.4311

-.508

-.812

.620

Var.

O12/T18

VT19

T 113

.659

D

St. Reg.Coef.

-.6G9

.480

-.170

Vnr.

VT18

Q^Tl,

T28

.546

E

St. Reg.Coef.

-.478

.502

-.208

Page 64: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 6 (continued)

Step Ho. -

1

2

3

4

5

6

7

0

9

10

11

A

Var.

VT1i

~V T 1 i

VT18

VT18

T 11

T15

U4m895T 24

"10

"12

.707

St. ileg.Coef.

.269

*

1.361

- .763

-.327

-.200

-.376

-.224

.192

.185

.157

Var.

VT l l

" l 2

T 12

.380

V

D

St. Ileg.Coef.

.436

.402

- .245

s uncorrelated

Locntion

C

Var. S ^ * * •Coef.

Q /Tl - .468

Qj /Tlg .367

T2g - .229

T28 - .215

.553

D

Var. St/ ;«»•Coef.

q 1 3 m f l -.474

Q3/Tlg .286

.309

Vnr.

q i / T 1 8

Q1 3 /Tl9

.417

E

St. Reg.Coef.

.448

- .443

*TIiid v»fi«blewn«.tepl«ced «t a Utter step in the stepwliie analysis SB indicated by the connecting «rov.

Page 65: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

58

Table 7. Zones for which variations in Q cause significant changes insimulated gradient at various locations

Correlated Q.

Uncorrelated Q.

A

c4

1

13

4

B

7

1

13

4

15

Location

C

7

12

6

13

D

13

12

13

E

7

12

13

14

*Variable Q_ was replaced by variable Q- at a later step in the

regression analysis.

of Zone 12 in the uncorrelated group while it appearsthree times in the correlated group. Comparing theseresults with Fig. 1, it is noted that analyses of thecorrelated Qi tend to select zones which are geographi-cally near the points of interest.

While these results indicate which zones have influ-ence on the dependent variable, the Qj, while impor-tant, are not always the most dominant variables, asmay be seen by comparison of the standardizedregression coefficients given in Table 5. Hence.it wouldbe reasonable to expect more dramatic changes ifcorrelations existed among the more dominant vari-ables.

Another observation is that the regression modelsusing ranked data (Table 5) tend to assign little impor-tance to the Q/T ratios, while analyses using raw data(Table 6) favor these ratios. Part of the reason the tworegression procedures disagree on the selection ofimportant variables is because the output is nonlinear inform. The raw regression tries to fit a linear model tothe nonlinear output, while the rank regression firstlinearizes the data via a rank transformation and thenfits a linear model. In other words, the regression modelfor raw data as presently formulated is insufficient fornonlinear output.

Because the sensitivity analysis was performed on amathematical representation of a conceptual model ofthe study area, it is interesting to compare the variableschosen to be important by the analysis with thoseintuitively expected to be the most important. As anexample, the results for location A at Yucca Mountainare discussed. The hydrologic characteristics of theregion thought to affect Yucca Mountain are as follows:recharge occurs on Pahute Mesa, and the resultinggroundwater flows from the mesa to a discharge area atOasis Valley, and also south through the TimberMountain and Yucca Mountain areas into the AmargosaDesert. Little water flows into Yucca Flat because theEleana Formation of low transmissivity (Zone 5)separates Pahute Mesa from Yucca Flat. The conceptualmodel therefore suggests that the amount of rechargeon Pahute Mesa (Qi), the amount of discharge in OasisValley (Q7), the transmissivities at Yucca Mountain(Tli and 72i)> and perhaps the transmissivities in theAmargosa Desert (Tls and 728) could be important inthe regression model.

The results of the rank and unranked analyses forlocation A are different, as discussed above, but notcontradictory. One or both of 71 and 72 for Zones 1and 8 appear in the list of important variables, as well as

Page 66: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

59

the recharge term for Zone 1. This occurrence agreeswith intuitive reasoning. The square of the termrepresenting discharge from Oasis Valley (g7) , thesecond variable chosen for the correlated, rank regres-sion, was replaced at a later stage in the regression by

fli-Terms not intuitively expected in the results are those

representing the transmissivity of Zone 5. Previoussensitivity experience with the model, using values ofhead rather than gradient, showed that simulated headsare relatively insensitive to transmissivity values assignedto low-transmissivity zones when they are surroundedby zones of higher transmissivity. However, currentresults indicate that hydraulic gradient may be sensitiveto values of T assigned to low-transmissivity zones.Reexamining the conceptual model, this result hasphysical significance. Although the amount of flowfrom Pahute Mesa into Yucca Flat is small, changes inthe flow, corresponding to different assigned values forthe transmissivity of Zone 5, will result in changes inthe amount of water available to flow through YuccaMountain. This relation exists because the recharge atPahute Mesa and discharge at Oasis Valley are fixed foreach simulation. Increasing the transmissivity of Zone 5will decrease flow through Yucca Mountain, resulting ina lower hydraulic gradient. The negative sign of theregression coefficient for Tl x Tl term of Zone 5 isconsistent with this reasoning.

The term Q$ is the most important term in thecorrelated raw regression model and is absent in therankings for other locations. Zone 5 is near the centerof the study area, and intuitive estimates of correlationsof Qs with the other Qi are high. The mean of Q$ waschosen to be zero because there is no evidence ofrecharge or discharge from Zone 5. If recharge ordischarge occurs, it is minimal and of minor hydrologicsignificance. The presence of Qs in the regression modelappears to be an artifact of the intuitive correlationmatrix. Thus, although the incorporation of depen-dences among the input variables is an importantconsideration, construction of the correlation matrixmust be done carefully.

In establishing values of the input variables, it wasassumed that groundwater flow was parallel to Tl. The

inclusion of various Tl and Tl x Tl terms in theregression models indicates that this assumption is notvalid for every zone. One of the goals of this study wasto determine whether anisotropy would have significanteffect on simulations of groundwater flow. These resultsindicate that anisotropy does affect model output. Theimportance of this effect is yet to be determined.

CONCLUSIONS

1. Modification of the LHS to include dependencesamong variables causes pronounced changes in theresults of regression analyses on raw variables. Thesechanges are realistic in terms of what is known abouthydrology of the modeled area. The correlationmatrix must be carefully constructed to ensuremeaningful results.

2. Although Q/T terms were important in the analysesof raw variables, as predicted by theory, they wererelatively unimportant in analyses of ranked varia-bles. As independent variables in the regressionanalyses are presently formulated, analyses of rankedvariables are more effective in determining whichvariables have a significant effect on model output,than are analyses of unranked, or raw, variables.Regression models based on raw variables may havebetter predictive abilities because of retention ofinformation but not as good as the finite-elementmodel used to generate the set of dependentvariables.

3. Sensitivity analyses of this type are analyses ofconceptual models which have been expressed asmathematical models. They are not tests of thenumerical accuracy of the mathematical models norof the validity of the conceptual models. This typeof analysis is, however, a useful tool to (1) helprefine conceptual models and (2) indicate types ofdata that will significantly improve the predictiveabilities of the mathematical model, and those thatwill not. Sensitivity studies will therefore be ofgreater value in modeling of transport of radio-nuclides where the number of variables is greatlyincreased, and the available data base is not yetproportionately larger.

REFERENCES

1. R. L. Iman, J. C. Helton, and J. E. Campbell, Risk Methodology for Geologic Disposal of Radioactive Waste -Sensitivity Analysis Techniques, SAND78-0912, Sandia Laboratories, Albuquerque, N. Mex., 1978.

2. I. J. Winograd and W. Thordarson, Hydrogeologic and Hydrochemical Framework, South-Central Great Basin,Nevada-CaRfomia with Special Reference to the Nevada Test Site, U.S. Geological Survey Professional Paper712-C, 1975.

Page 67: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

60

3. R. L. Cooley, "A Method of Estimating Parameters and Assembly Reliability for Models of Steady-State•Groundwater Flow, 1. Theory and Numerical Properties," Water Resour. Res. 13:318-24 (1977).

4. R. L. Cooley, "A Method of Estimating Parameters and Assessing Reliability for Models of Steady-StateGroundwater Flow, 2. Application of Statistical Analysis," Water Resour, Res. 15:603-17 (1979).

5. W. W. Dudley and J. D. Carson, Effect of Irrigation Pumping on Desert Pupfish Habitats in Ash Meadows, NyeCounty, Nevada, VS. Geological Survey Professional Paper 927,1976.

6. G. E. Walker and T. E. Eakin, Geology and Groundwater of Amargosa Desert, Nevada-California, NevadaDepartment of Conservation and Natural Resources, Ground-Water Resources—Reconnaissance Series Report14,1963.

7. G. A. Miller, Appraisal of the Water Resources of Death Valley, California-Nevada, U.S. Geological SurveyOpen-File Report 77-728, 1977.

8. T. E. Eakin, S. L. Schoff, and P. Cohen, Regional Hydrology ofaPart of Southern Nevada - A Reconnaissance,U.S. Geological Survey TEI-833, Open-File Report, 1963.

9. R. K. Blankennagel and J. E. Weir, Jr., Geohydrology of the Eastern Part ofPahute Mesa, Nevada Test Site, NyeCounty, Nevada, U.S. Geological Survey Professional Paper 712-B, 1973.

10. D. I. Leap, R. K. Waddell, and W. Thordarson, "Hydrologic Data Base for Solute Transport Model of NevadaTest Site and Vicinity (in preparation).

11. R. A. Freeze, "A Stochastic-Conceptual Analysis of One-Dimensional Groundwater Flow in NonuniformHomogeneous Media," Water Resour. Res. 11: 725-41 (1975).

12. M. D. McKay, W. J. Conover, and R. J. Beckman, "A Comparison of Three Methods for Selecting Values ofInput Variables in the Analysis of Output from a Computer Code," Techometrics 21: 239-45 (1979).

13. R. L. Iman and W. J. Conover, "Small Sample Sensitivity Analysis Techniques for Computer Models, with anApplication to Risk Assessment" (in preparation).

14. F. A. GrzybiM, An Introduction to Linear Statistical Models, Vol. 1, McGraw-Hill, New York, 1961.

Page 68: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

61

APPENDIX

Theorem. Let X, ~ Np (ji,V) as defined in the above statements. Le t? be a p X p symmetric, positive definite,correlation matrix. Then there exists a p X p matrix A, such that Y; = A (Xj• — n) + n and each Kf.,/ = 1 ,2 , .... f>will be normally distributed and have mean m- and variance of, (the same mean and variance as AV.); and, further-more, Y,- will have correlation structure, P.

Proof. Let X/ ~Np (p., V) as defined in the theorem. Let

0

0

and let V~ W be the inverse of W2. Then, it is well known that V~ ^^X,- - n) ~ Np(0,1), where / is a p X p,identity matrix and 0 is a vector of zeros.

Now consider the matrix/*. There exists an orthogonal matrix C such that

CTPC = D,

where D is a diagonal matrix of the eigenvalues of the matrix? and the columns of C are the eigenvectors of P [14,p. 5 ] . Let Z ^ b e the diagonal matrix of square roots of the eigenvalues. Note that

P = CDCT = CDl>2Dllk:T

Let/I = KV2a)V2F-1/2; then

- V2)V(V-where A VA T =

_ yll2pVll2

Note that F ^ K ^ i s the variance-covariance matrix of Y,-, and/* is its correlation matrix. This completes the proof.

I.

Page 69: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Evaluation of ECC Bypass Data with a NonlinearConstrained MLE Technique*

Thomas A. Bishop, Robert P. Collier, and Robert E. KurthBattelle Columbus Laboratories

Columbus, Ohio

ABSTRACT

Recently, Battelle's Columbus Laboratories have been involved in scale-model tests of emergencycore cooling (ECC) systems for hypothesized loss-of-coolant accidents in pressurized water reactors(PWR). These tests are intended to increase our understanding of ECC bypass, which can occur whensteam flow from the reactor core causes the emergency coolant to bypass the core and flow directly tothe break. One objective of these experiments is the development of a correlation which relates theflow rate of water penetrating to the core to the steam flow rate. This correlation is derived from dataobtained from a 2/l s scale model PWR at various ECC water injection rates, subcoolings, pressures,and steam flows. The general form of the correlation being studied is a modification of the correlationfirst proposed by Wallis. The correlation model is inherently nonlinear and implicit in form, and themodel variables are all subject to error. Therefore, the usual nonlinear analysis techniques are inappro-priate. A nonlinear constrained maximum-likelihood-estimation technique has been used to obtainestimates of the model parameters, and a Battelle-developed code, NLINMLE, has been used toanalyze the data. The application of this technique is illustrated by sample calculations of estimates ofthe model parameters and their associated confidence intervals for selected experimental data sets.

INTRODUCTION

Battelle's Columbus Laboratories have been involvedin scale-model tests of emergency core cooling (ECC)systems for hypothesized loss-of-coolant accidents(LOCA) in pressurized water reactors (PWR). The acci-dent of interest occurs when a break takes place in acold leg pipe causing the reactor core to become un-covered.

One objective of these tests is to increase our under-standing of ECC bypass. Bypass of the ECC water oc-curs when the steam flow from the uncovered reactorcore prevents the injected ECC water from reaching thereactor core. Instead, the steam flow causes the ECCwater to flew directly to the opening in the broken coldleg.

*This research was sponsored by the U. S. Nuclear RegulatoryCommission under Contract No. NRG04-76-293-O4.

Experimental data has been obtained from a 2As scalemodel prototypic PWR, using various ECC water injec-tion rates, pressures, subcoolings, and steam flows. Thedata is being used to develop correlation models thatrelate the flow rate of the water penetrating to the coreto the steam flow rate. Using this correlation, the steamflow rate required for complete bypass can be pre-dicted.

The proposed correlation models that describe thiscountercurrent steam-water flow phenomenon are sta-tistical in nature. The experimental variables used inestimating model parameters are all subject to randomexperimental error. Therefore, statistical techniquesmust be used to obtain valid parameter estimates andtheir associated standard errors.

The statistical properties of the data and the proposedmodels are nonstandard in two important respects.First, the models are implicit in nature, and in then-more general forms they are nonlinear in their param-

62

Page 70: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

63

eters. For example, the Wallis flooding correlationmodel used with /* scaling [1] has the implicit form

+MJ*lli - C= 0 , (1)

where J* is a dimensionless gas flow rate, /£ is adimensionless penetrating liquid flow rate, and the con-stants M and C are parameters to be estimated. A mod-ified version which was suggested by Beckner, Reyes,and Anderson [2] is nonlinear in the parameters. It hasthe form

+ \M Vfjfin / * ' * - C = 0 . (2)

In this model J£in is a dimensionless injected liquidflow rate, X is the condensation potential, and the con-stants A,F,M, V, and C are parameters to be estimated.

The second important departure from the standardstatistical models is that each of the experimental vari-ables J* Jf, J^in, and X is subject to random experi-mental error which is not negligible.

Previous approaches used to obtain estimates of themodel parameters involved rewriting the model in func-tional form, such as

(3)

or

jfn=c-MJfn.s u

(4)

Standard linear or nonlinear regression techniques werethen used to estimate the model parameters. A criticalstudy of this approach reveals that the implicit natureof the model and the experimental error associated withthe variables violate the assumptions required for validstandard analyses. In addition, the choice of the depen-dent variable is arbitrary.

To place the statistical analysis on a solid foundation,these nonstandard features of the model must be con-sidered. A nonlinear constrained maximum-likelihood-estimation (MLE) method of analysis developed byBritt and Luecke [3] addresses both nonstandard fea-tures simultaneously and was chosen as the appropriateestimation procedure. A description of the techniqueand examples of its application to ECC penetration ex-periments are given below.

STATISTICAL MODEL

The Britt-Luecke (B-L) method of estimating param-eters in nonlinear-implicit models is based on the fol-lowing statistical model. We assume that k experimentalruns are made. For each run, the m variables X\,X2, . . . , Xm are measured, and each is subject to ex-perimental error. The collection of observed experi-mental variables for the /th experimental run is denotedby the m -dimensional random vector

= (Xi\, = U2, • • - , k.

Because the experimental variables are subject to er-ror, we actually observe the true value plus a randomerror. This process is modeled by

ih / = 1 , 2 m (5)

where xtj is the true value of the/th variable for the /thexperimental run and e,y is the associated experimentalerror. We assume that the vectors of errors

are independent and that each has a multivariate normaldistribution with mean zero and covariance matrix R*.In vector notation we can write

= 1,2 k, (6)

where z,- = (x,i, x,-2, . . . , Xjm)' is the vector of truevalues and e,- is the vector of errors.

A model is hypothesized which relates the true, un-observed values of the experimental variables to someunknown parameters via an implicit function

, ...,xim;d0) = (7)

where 60 is the vector of unknown parameters to beestimated.

The complete collection of experimental data can becharacterized by the ^-dimensional vector

Z - , X21, • • •,

where q = kXm. Then

(8)

Page 71: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

64

where

i - (2i,Z2, . . . , Zm)

= (jC;i, . . . , xirn, x2i, . . . , X2m, • • •» xkm)

and

= (e'i,e'2 e'k)'

elm,

The vector zo is the collection of true values of thevariables at the time of measurement, and e is the vectorof experimental errors for the entire experiment. Whenone considers the distribution of the vector e,-, the vec-tor e can be determined to have a ^-dimensional multi-variate normal distribution with mean zero and covari-ance matrix

R =

R*

R*

R*

The true values ZQ and the vector of parameters BQ arerelated through the fc-vector of functions

(9)

Constrained Maximum Likelihood Estimation of 60

The joint density function for Z is given by

. (10)

Once the experiment is completed and we observe thatZ = z*, we consider the joint density to be a function ofz0 and $o only. The likelihood function is given by

This function is maximized with respect to z and 6subject to the condition that they are constrained tothose points such that

f(z,0) (12)

The value of 6 where this maximum occurs is taken asthe estimate of0o-

Estimation Algorithm NLINMLE

The Battelle-developed code NLINMLE uses themaximization algorithm given in [3] to obtain estimatesof the elements of 0o. In addition to point estimates,the code produces a 95% confidence interval for eachparameter and the covariance matrix for the estimatesof the parameters. Several different plots are output,including a plot of residuals for a selected variable, pre-dicted vs observed plots for a selected variable, and aplot of the fitted curve.

APPLICATION TO ECC BYPASS DATA

This technique has been applied to 2/is scale data ob-tained from ECC bypass experiments conducted byBattelle Columbus Laboratories. The hypothesizedmodel was of the form of Eq. (2), although for thefollowing analyses, parameter A was specified to be-5.5 (as a result of previous work). The experimentswere conducted thus: For a given experimental run/*,J*in and X were controlled at nominal values. Becauseof system fluctuation, however, they could not be heldconstant at these nominal values; and the recorded ob-served value was a measurement of the actual value at-tained. These measurements contained measurementerror; therefore, the value recorded was of the form ofEq. (5). The resulting value of/* with the measurementerror was then measured and recorded.

This experimental data is of the form previously dis-cussed. We made k experimental runs. For each run thevariables J*, / £ , /£ /V), and \ were measured, and eachwas subject to significant experimental error. In thiscase

<" = Vg,i> Jt,i> JLin,i'

exp[-V2 (z* - z)'R-'(z* - z)] . (11) , V).

Page 72: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

65

For this data we assumed that the errors associated withthe four experimental variables were mutually indepen-dent. Therefore, R* had the form

r, 0 0 0

0 r2 0 0R* =

0 0 r3 0

o o o n

In practice, the elements of R* are estimated fromrepeatability trials or error propagation techniques.

Analysis of Simulated CountercurrentFlow Data

To assess the B-L technique and to test the NLINMLEcode, the B-L method was applied to a simulated dataset. This data set was constructed to approximate thetype of data obtained from the ECC bypass experimentspreviously discussed.

The data set consisted of observed values o f /* , / | ,*tin' ^ ^ generated by simulation experiment. Thetrue values of /* , / £ , / / , „ , and X were obtained byfixing the values of/*,„ and Xin various combinations,varying / * by specified increments over a typical experi-mental range, and then calculating the corresponding/*values using Eq. (2) with C~ 0.3,F= 0.4,M= 0.5,andV= 3.5. Thus the true values of the experimental vari-ables satisfied Eq. (2) exactly, and the values of theparameters C, F, At, and V were known. Random ex-perimental errors were independently generated byusing the random number generator GGNOR with vari-ance ratios approximately equal to those estimatedfrom actual data. Those errors were added to the truevalues of the variables, creating the simulated data set.In this way a data set was obtained from a knownmodel with known parameters.

The simulated data set was analyzed by three dif-ferent methods: the B-L technique; standard nonlinearregression, rewriting the model with / * as the depen-dent variable; and standard nonlinear regression, re-writing the model with /£ as the dependent variable(Tables 1-3).

Recall that the true values of the model parameterswere C=0.3, F=0A, M=0.5, and K=3.5. The B-Ltechnique provided excellent estimates. The standardnonlinear technique, using J% as the dependent variable,gave similar »esults. However, except for parameter C,this technique provided shorter confidence intervalsthan the B-L technique. Presumably, this is accounted

TaMe 1.

Parameter

CFMV

Confidence interval* u»ng B-L technique

Lower limit

0.29630.37640.47482.8432

Final value

0.30980.39800.53803.S664

Upper limit

0.32330.41950.60114.2895

Table 2. Confidence intervals using Jg tithe dependentvariable

Parameter

CFMV

Lower limit

0.29460.37580.46603.1071

Final Value

0.30880.3938052493.6608

Upper limit

0.32290.41170.58384.2145

Table 3. Confidence intervals using Jf as the dependentvariable

Parameter

CFMV

Lower limit

0.28890.39980.56422.1471

Final value

0.31430.43450.64843.0204

Upper limit

0.33970.46920.73253.8938

for by the fact that the standard approach did not takeinto account all data error and therefore underestimatedthe standard errors of the estimates. The standard anal-ysis with / £ as the dependent variable produced verypoor estimates of M and V. In fact, the confidenceinterval forilf failed to cover the true value of 0.5.

The failure of the standard analysis to provide a goodfit to the data when / * was used as the dependentvariable is emphasized by comparing the observed dataand predicted values (Figs. 1 and 2). The observed val-ues of/* are plotted against the observed values o f / ? .The smooth curve superimposed on the data points isthe curve estimated from the data with /£-n and X set atthe average fixed values indicated. It is clear that forX = 7.0 the fitted curve using the standard techniquesystematically overpredicts for low /£ values and under-predicts for high / £ values. If the analysis had beenperformed in this manner, the analyst might have beenled to the false conclusion that the model was incorrect.

Analysis of Actual Data from an ECCBypass Experiment

We next discuss the application of the B-L techniqueto an actual data set obtained from an ECC bypass ex-

Page 73: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

66

3O

CO<

(O .240to

to •'

AVERAGE JLIN* -.ISOO

0.000 .025 .050 .075 100 • 12S ISO ^ . I 7S .200

D 1 M E N S 1 0 N L E S S L I Q U I D FLOW R A T E . JL*

Fig. 1. Penetration curve predicted by the B-L technique.

u

LU

RAT

3O_Jli.

o

.ESS

*j

zoio

1s

.410

.310

.240 •

.120 •

AVERACE JLIN -.1500+LMSDA • 3.0000XLMWA • 7.0000

0.000 .029 050 .075 .100 .125 .130 .171 .200

DIMENSIONLESS LIQUID FLOW RATE. JL*

Fig. 2. Penetration curve predicted using 7? as the dependent variable.

periment conducted at Battelle Columbus Laboratories.The experiment consisted of 66 experimental runs(Table 4).

For this data set we noted the following features: TheB-L technique (Fig. 3) and the method using / * as thedependent variable (Fig. 4) gave similar results. Theyprovided a good fit to the data for the low value of yjf^equal to 0.0612, but they overpredicted for the Jfin

value of 0.1209. The method using/* as the dependentvariable (Fig. 5) did just the opposite. It fit the datawell for flin equal to 0.1209, but overpredicted forIt. equal to 0.0612. (The remaining data plots are notdiscussed because the three methods provided almostidentical fits in these cases.)

It is clear that for this data set the hypothesizedmodel is not adequate to describe the data for all values

of f£in. However, the B-L technique and the methodusing / * as the dependent variable indicate differenttypes of corrections that need to be made in the model.The analysis by the B-L technique indicates that weshould adjust the model to bring the curve down at highvalues of J*in, whereas the latter method indicates thatwe need to adjust the model to bring the curve down atthe low values of J^in. The question is: upon whichanalysis should we rely?

Viewing the results of the simulated-data-set analysis,we know that the method using / | as the dependentvariable can indicate lack of fit even when the model iscorrect. The B-L technique showed no lack of fit whenthe model was correct. Therefore, we concluded thatthe overprediction observed for J^in =0.0612 using/£as the dependent variable was not convincing evidence

Page 74: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

67

Table 4 . Data from a 2 / i s scale experiment

123456789

101112131415161718192021222324252627282930313233

000000000000000.000,0.0.0,0.0.0.0.0.0 .0.0 .0 .0 .0 .0.

*J

g

.1984

.2194

.2079

.2079

.2401

.2409

.2459

.2688

.2556

.2566

.2535

.2260

.2516

.2493

.2974

.2942

.3051

.2754

.2706

.2658

.2542

.2382,2401,2394,2650.26232935299232673029288625852415

0000000000000000000,0,0.0.0,0.0,0.0 .0 .0 .0 .0 .0 .0 .

\

.0567

.0380

.0518

.0485

.0203

.0336

.0230

.0113

.0162

.0135

.0157

.0865

.0717

.0653

.0140

.0253

.0244

.0226

.0560

.0504

.0570

.0832

.0877

.0684,0641,06760206,018702460240,034608271068

JLin

0.06180.06180.06170.06160.06160.06170.06180.06100.06050.06020.06000.09610.09600.09590.09640.09640.09630.09620.09610.09600.09590.09580.12120.12090.12090.12100.12150.12120.12100.12080.12080.12050.1203

55555555555555,5,5.55,5,5.

X

.2399

.2393

.2660

.2641

.2564

.2619

.2524

.2569

.2755

.2674

.2718

.2435

.2582

.2424

.3034

.2827

.2645

.2670

.3133

.28715.31665.5,5.5,5.5.5.5.5.5.5 .5 .

.3468

.2765

.2888,2749.29493384328130933023329332693620

343536373839404142434445464748495051525354555657585960616263646566

00000000000000000000000000000.0.0,0.0,

<

.2416

.2781

.3006

.3234

.3075

.2863

.2458

.2108

.1841

.1854

.2059

.2228

.2234

.2489

.1917

.1887

.1260

.1364

.1536

.1652

.1853

.2013

.2214

.1987

.1879

.1674

.1418

.1041

.1021,1188.1253.1257,1342

<

0.09030.03870.01530.01520.01260.05970.06640.09100.08380.05980.03560.02670.02950.01750.06090.06310.07530.08190.06330.04930.01890.01460.01350.01360.02640.05000.04710.07490.06690.04780.04460.05520.0403

JLin

0.09700.09730.09750.09770.09760.09740.09700.09800.09660.09680.09690.09690.09690.09690.09650.09630.09630.09630.09630.09630.09640.09630.09610.096C0.09590.09580.09550.09650.09640.09640.09640.09650.0964

X

5.46345.50265.55225.57255.58255.60245.53684.84353.72883.75163.74303.75743.76323.76433.77543.77432.17752.18412.18672.20332.17412.20232.18812.18062.16872.19132.20901.30701.27911.27871.24181.23271.2203

of model inadequacy. On the other hand, because theB-L technique fits a true model adequately, the over-prediction at high values of /£1W observed with thistechnique was viewed as convincing evidence of modelinadequacy. It was therefore concluded that the modelwas inadequate for high values of •/£,-„, and the modelwas adjusted accordingly.

Comparing the covariance matrices (Tables 5, 6, and7), we note that in most cases the method using J* asthe dependent variable produces the smallest standarderrors of the estimates. The method using /£ as thedependent variable produces the largest standard errorsof the estimates. For this case the estimates producedby the B-L technique fall between the two othermethods. While additional analysis is required to judgethe quality of the standard-error estimates produced by

the various techniques, it is clear that the estimatesproduced by the two standard nonlinear regressionmethods cannot be trusted. Both of these methods con-sider only the error associated with the dependent vari-able and ignore the error and correlation structureassociated with the other variables.

Table 5. Covariance matrix for estimatesusing B-L technique

cFMV

C

0.0003-0.0004

0.0007• 0.0091

F

0.0007-0.0010-0.0151

M

0.00250.0282

V

0.4068

Page 75: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

68

Table 6. Covariance matrix for estimates using/* as the dependent variables

C F M V

Table 7. Covariance matrix for estimates usingJ* as the dependent variable

C F • M V

cFMV

0.0002-0.0003

0.00070.0082

0.0006-0.0010-0.0132

0.00270.0298 0.3991

CFMV

0.0004-0.0005

0.00100.0121

0.0008-0.0014-0.0193

0.00390.0402 0.5322

<3O

<

0000

.315 r

.252

AVERAGE JL1N - . 0 6 1 2

.016 .032 .048 .064 .080 .096 .112

OIMENSIONLESS LIQUID FLOW RATE. JL*

<or

to<

to00

olor

• 300

.075

AVERAGE J L I N - . 1 2 0 9

0.000 ' 1 ' 1 1 I I I I0.000 .022 .044 .066 .OBI .110 .132 .154 .176

DIMENSIONLESS LIQUID FLOW RATE. JL*

Fig. 3. Penetration curve predicted by the B-L Technique.

Page 76: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

69

JT -252

V)<

tntn

z

0.000

AVERAGE JLIN - .0612

0.000 .01* .032 -04« .0*4 .010 -0M .112

DIMENSIONLESS LIQUID FLOW RATE. JL*

H .J00

3aiT1 -2SS

tn<

COCOUi—Izo10

I

.150 -

0.000

AVERAGE JL1H - . 1 2 0 9

0.000 .022 .044 .M( .OH .110 .132 .154 .17*

0IMENSI0NLESS LIQUID FLOW RATE. JL*

Fig. 4 . Pentrttion curve predicted using J% as the dependent variable.

CONCLUSIONS

The B-L technique places the statistical analysis ofECC bypass data on a solid theoretical foundation. It isdesigned to handle simultaneously both nonstandardfeatures of this data — the implicit model and the errorin all the variables. The technique has performed wellboth on simulated data and actual experimental dataobtained at Battelle Columbus Laboratories. The onlypractical problems we have encountered are conver-gence problems for some forms of the correlationmodel.

As we have seen, a wise but arbitrary assignment of adependent variable ( J p produces favorable results whenthe standard nonlinear regression techniques are used.However, there is no criterion for such a selection andno guarantee that a reasonable choice even exists. Whena poor choice is made - for example, choosing /£ as thedependent variable - poor estimates may be obtained,and misleading conclusions may be reached. The B-Ltechnique eliminates the necessity of making such achoice and, at the same time, accounts for all of theexperimental error. The B-L technique provides the ap-propriate statistical analysis of the physical processesbeing studied in ECC bypass experiments.

Page 77: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

70

.320

.256

3o

V)inUJ

VIzUJ

.121

.OS*

o

UJ

3o

<u</>

V)zUJ

.075

AVERAGE J L I N • . 0 6 1 2

.CMC .032 .041 .064 .010 -OSS -112

DIMENSIONLESS LI0UID FLOW RATE. JL*

AVERAGE J L I N * - . 1209

0-000 .022 .044 .088 .OH -110 .132 .154 .17«

OIMENSIONLESS LIOUID FLOW RATE, JL*

Fig. 5. Penetration curve predicted using J£ as the dependent variable.

REFERENCES

1. G. B. Wallis, One-Dimensional Two-Phase Flow, McGraw-HOl, New York, 1969.2. W. D. Beckner, J. N. Reyes, and R. Anderson, Analysis of ECC Bypass Data, NUREG-0573,1979.3. H. I. Britt and R. H. Luecke, Techrometrics, 15(2), 233-47 (1973).

Page 78: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Nuclear Fuel Rod Stored-Energy Uncertainty Analysis: A Study ofError Propagation in Complicated Models *

A. R. Olsen and M. E. CunninghamPacific Northwest Laboratory*

Richland, Washington

ABSTRACT

With the increasing sophistication and use of computer codes in the nuclear industry, there is agrowing awareness of the need to identify and quantify the uncertainties of these codes. In any effortto model physical mechanisms, the results obtained fiom the model are subject to some degree ofuncertainty. This uncertainty has two primary sources. First, there is uncertainty in the model'srepresentation of reality. Second, there is an uncertainty in the input data required by the model. Ifindividual models are combined into a predictive sequence, the uncertainties from an individual modelwill propagate through the sequence and add to the uncertainty of results later obtained. Nuclear fuelrod stored-energy models, characterized as a combination of numerous submodels, exemplify modelsso affected. Each submodel depends on output from previous calculations and may involve iterativeinterdependent submodel calculations for the solution. The iterative nature of the model and the costof running the model severely limit the uncertainty analysis procedures.

An approach for uncertainty analysis under these conditions was designed for the particular caseof stored-energy models. It is assumed that the complicated model is correct, that a simplified modelbased on physical considerations can be designed to approximate the complicated model, and thatlinear error propagation techniques can be used on the simplified model.

INTRODUCTION

During a postulated loss-of-coolant accident (LOCA),stored thermal energy in a fuel rod is the major cause offuel rod damage. As a result, the final acceptancecriteria for emergency core-cooling systems require acalculation of the fuel rod stored energy during bothnormal operation and a LOCA. Because these calcula-tions are used in regulatory decisions affecting com-mercial nuclear power plants, uncertainties in thecalculated values have caused a temporary derating, orreduced operational flexibility, of many plants anddelays in the startup of additional plants. Many of theseuncertainties can be attributed to the lack of well-characterized data for fuel that has been irradiatedduring normal plant operation.

To help reduce these uncertainties, the Reactor SafetyResearch Division of the Nuclear Regulatory Commis-

•Operated by Battelle Memorial Institute for the U.S. Depart-ment of Energy.

sion (NRC) is sponsoring the Experimental Verificationof Steady-State Codes Program at Pacific NorthwestLaboratory. The primary emphasis of the program is toobtain and analyze well-characterized fuel rod data forfuel that has been irradiated during normal powerreactor operation. The data will then be used by NRCto verify and develop audit codes. The program alsostrives to identify the source and propagation ofuncertainties in computer code calculations to gain aperspective on current code predictions. The meth-odologic approach used in analyzing these computercode uncertainties is the subject of this paper.

Discussion begins with a description of the com-plexity of the stored-energy conceptual model, althoughneither the actual computer code nor the preciseprocedures used in implementing the model are given. Abrief discussion on application of the model follows.The steps in the modeling processes are identified, andthe relationship of the present work to this frameworkis clarified. A number of possible approaches to thesolution of the problem are considered. Finally, the

7!

Page 79: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

72

chosen approach, simplified modeling, is illustrated forthe stored-energy conceptual model.

STORED-ENERGY CONCEPTUAL MODEL

The calculation of cladding and fuel temperatures inmany fuel performance computer codes rests on thefollowing assumptions.• The flow of heat is exclusively radial.• Heat is only produced in the fuel and only at a rate

proportional to the local neutron flux.• The flow of heat across the fuel-cladding gap can be

characterized by a conductance, the temperature-dependent components of which may be evaluated atthe average gap temperature.

The third assumption implies the need for an iterativecalculational procedure to estimate and then recalculategap conductance as a function of gap size, gas composi-tion, and gas temperature. Many interacting variablesand paths (Fig. 1) must be considered, and at the heartof the procedure is the gap conductance model itself.

Once a converged gap conductance value has beenachieved, a temperature profile for the fuel and cladding(Fig. 2) can be calculated in a fairly straightforwardmanner. From the coolant to the center of the fuel pin,there is first a small temperature rise across the waterfilm at the cladding surface. This small rise is calculated

Fig. 2. Idealized radial temperature profile.

from the film (conductance) coefficient, much as thetemperature rise across the gap is calculated from thegap conductance. The temperature next rises approxi-mately logarithmically through the cladding and thenundergoes a steep rise across the gap in an amount equal

HKL-CLADDINGINTERfACIAL

PRESSURE OR GAP

FUEL-CLADDING HEAT TRANSFER COEFFICIEN1

Fig. 1. Relationship among variables in a typical steady-state fuel-performance code.

Page 80: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

73

to the fuel-surface heat flux divided by the gapconductance. Next is an approximately parabolic tem-perature rise from the surface to the center of the fuel,which is caused by heat generation in the fuel. Caremust be taken to account for both the temperaturevariation of fuel thermal conductivity and the spatialvariation of the heat generation rate when calculatingthe fuel temperature rise. The basic models for calculat-ing fuel-pin temperatures and stored energy follow:

Water-film temperature rise, AT/?

This temperature rise typically amounts to less than2 0 t . Therefore, the film coefficient is either calculatedfrom a correlation such as Dittus-Bolter or Jens-Lottesor is simply assigned to an arbitrary large value.

Cladding temperature rise,

This temperature rise can be exactly calculated bysolving the following equation:

/?>'" Kc(T)dT= qjl-n Wol'd) ,* CO

whereKC(T) = cladding thermal conductivity,Tco, Tcj = cladding outer and inner temperatures,

respectively,rco> rci ~ cladding outer and inner radii, respectively,q = linear power.

Values for KC(T) are well known for the variouscladding materials.

Gap temperature rise, ATg

The gap temperature rise is calculated from the gapconductance in an iterative manner as previously de-scribed. In the absence of fuel-cladding contact orsignificant radiation, the gap conductance is given by

' * (A+gcUfV

wherehg = gap conductance (W/cm2-°C),Kg = gas thermal conductivity at the mean gap tem-

perature (W/cm-°C),de = effective physical gap width,gc, gy= temperature jump distance at cladding and

fuel respectively.

The methods of calculating de, gc, and gj are subjects ofdebate.

In addition, the adequacy of the entire model isdebatable at small gap sizes. When solid contact occurs,there will obviously be some flow of heat through thecontact points. Estimates of the consequent increase inconductance vary widely but generally take the form ofan added term similar to the following:

wherehs = solid conductance,Km = mean conductivity,Pa = contact pressure (apparent pressure),H = cladding Meyer hardness,F(TF>

rc< V . K) ~ a function of fuel cladding rough-ness, peak height, and wave length,

n = exponent.

Values for Km are easily calculated, but estimates of nand F vary widely and cause order-of-magnitude dif-ferences in the estimates of hs. Definitive data arelacking.

When the conductivity of the initial fill gas (usuallyhelium) becomes degraded from released fission gases,the contribution to the gap conductance from radiantheat transfer across the gap attains significance. Theterm added to the gap conductance has the form

wherehr = radiant conductance,F = view factor between two infinite concentric cyl-

inders and is found by f = {(l/e^)+ (AF/AC)1

Ts, Tc = fuel outer-surface and cladding-inner-surfacetemperatures, respectively,

ep, ec = fuel and cladding emissivities, respectively,Ap, Ac = values of surface area per unit length of fuel

and cladding, respectively,a = Stefan-Boltzman constant.

The foregoing assumes the fuel and cladding surfaces tobe grey surfaces.

The total ATg is now calculated as

* 2itr(kg+hs+hr)'

where r is the average gap radius.

Page 81: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

74

Fuel temperature rise, A7>

The temperature profile across the fuel is in theoryalmost as easy to find as the temperature rise across thecladding. It is necessary to solve only the followingequation for any fuel radius r:

F=2(r dr'/r'f Q(r')r"dr" f QL&dr,

where1» rs ~ fuel inner and outer radii, respectively,Kp(T) = fuel thermal conductivity,Q = volumetric heat generation (W/cm3),q = linear power (W/cm),Tr = temperature of radius r.

Unfortunately, the structure of UO2 fuel varies morethan that of the cladding; the conductivity of fuel madeby one fabrication process may vary considerably fromthat of fuel made by another process. Furthermore, thefuel is subject to physical changes in the reactor whichaffect conductivity. These changes include cracking,fission-product swelling, pore migration, and graingrowth. Similarly, the heat-generation rate is propor-tional to the fission rate in the fuel and is thus affectedby all factors that affect the fission rate: neutron energyspectrum, 2 3 S U depletion, plutonium buildup, andfission-product migration. Thus, whereas the fuel tem-peratures are not hard to determine in theory, theexpressions for Kp and Q at any point beyond thebeginning of life are subject to considerable uncertainty.

Feedback Effects in the Calculationof Gap Conductance

A temperature profile such as that presented in Fig. 2is necessarily the result of iterative calculation of gapconductance and fuel-pin temperatures. The majorfeedback effect from fuel temperature to gap con-ductance comes from the temperature dependence ofthe fuel-surface radius because, for a given power leveland residence time, the cladding temperatures and innerradius are independent of gap conductance. The tem-perature dependence of the fuel outer radius is notfound from the simple thermal expansion of a cylinderwith a parabolic temperature distribution. The hoopstresses generated even at very low power are sufficientto crack the fuel radially. Therefore, the estimate of thefuel-surface radius as a function of power or tempera-ture must include some accounting for the movement ofcracked fuel, usually called relocation. Considerable

debate centers around the various models, theoreticaland empirical, for fuel cracking and relocation.

Another feedback effect from fuel to gap appears infission-gas release. Both the gas conductivity and thetemperature jump distance are known to be affected bythe gas composition. The temperatures and timesrequired for various rates of release are also subjects ofdebate.

Calculation of Stored Energy

Once a temperature profile in the fuel has beenobtained, it is a relatively simple matter to calculate thestored energy. The specific heat Cp for UO2 istemperature dependent; the stored energy must berepresented as

=2r;oprSy>cp{r)dT'dr,

whereTref= reference temperature, often the coolant tem-

perature,p = density of the fuel,Cp = specific heat,rO'rs~ fuel inner and outer radii respectively.

MODELING PROCESS ANDTHE APPLICATION PROBLEM

The conceptual model described in the previoussection has been refined and implemented as a code fora computer model. This fuel-performance model orcode GAPCON-THERMAL-3 has been described indetail [1] and may be simply thought of as acomplicated mathematical function requiring specificinput variable information and resulting in a trans-formed value for stored energy as an output. Forpresent purposes the model defined by GAPCON-THERMAL-3 is assumed to be correct. With thisassumption, two questions are of interest: What is theuncertainty in stored energy at time t given specifieduncertainties for the input variables? What contributiondoes each input uncertainty make to the stored-energyuncertainty? The first question concerns the amount ofuncertainty in predicting the stored energy in a fuel pin.Presumably this information would be useful in boththe design and licensing processes of commercial nuclearreactors. The second question is concerned with tracingthe source of the uncertainty in stored energy back to

Page 82: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

75

the basic input variables. Stated slightly differently,where should the effort be put to reduce stored-energyuncertainty? Implicitly this question assumes that theuncertainty in stored energy, even that of a realisticmodel, is great enough to cause a concern with itssource. A brief discussion of the modeling process helpsto clarify the relationship of this study to othermodeling studies.

Some of the stages of a model's development havebeen described as system analysis, system synthesis,model verification, model validation, model analysis,and model application [2]. System analysis and syn-thesis involve conceptualizing the model and formulat-ing it into a model or computer code. Model verificationdetermines the correctness of the code with regard tothe intended conceptual model. Model validation isconcerned with the soundness of the model with respectto reality. Here the output from the model is checkedagainst actual data collected under the same conditionsas those of the model. The shakedown testing of themodel under various input conditions is a part of modelanalysis, which also includes sensitivity testing. Finally,the model is applied to problems of interest.

The modeling process is dynamic. Each stage may beundergone many times in the life of a particular modeldevelopment. Not all of the stages need to be con-sidered at any point in time, nor does a single stage needto be concerned with all of the other stages. In somestudies only a single stage may be of concern. Stored-energy uncertainty analysis is a case in point. Previousstudies have dealt with system analysis and synthesisalong with model verification; the computer codeGAPCON-THERMAL-3 is the product of that work.Although some model-validation work for GAPCON-THERMAL-3 has been conducted, it is not of directconcern for the uncertainty analysis. It is assumed thatthe model structure defined by the code is correct forthe uncertainty study. The uncertainty analysis can beconsidered part of the model analysis stage, and in onesense it is a modification of a sensitivity analysis. Theconcern is how sensitive the model predictions are toerrors in the input variables. One of the difficulties inthe model-validation process is ensuring that the inputto the model and the conditions under which the dataare collected for comparison with the model predictionsare the same. Without this assurance, the model-validation process is complicated with interpretationdifficulties. One of the uses of the uncertainty analysisis to determine which input variables need to bedetermined precisely in a model-validation study.

The mode! may be mathematically described as asimple functional relationship between the output

variable stored energy and the input variables. Forexample,

= h(xl, x2

where it is assumed that the JC'S are the input variables.For the uncertainty analysis it is further assumed thatthe x's are set at some level of interest and that it ispossible to assign some level of uncertainty to themFor example, if the input variable is linear power, itsnominal level may be 30 kW/m with an uncertainty of10%. The uncertainty analysis provides information onthe variability, or uncertainty, of stored eneTgy anddiffers from the usual sensitivity analysis. The uncer-tainty analysis uses only a portion of the range of theinput variables that are allowable in the code. Anominal value is selected, and a small range of valuesaround that value is all that is of concern for aparticular analysis. For a sensitivity analysis, the com-plete range of allowable values for each input variable isof interest. The uncertainty analysis may be describedas the finding of the probability distribution of storedenergy given the probability distribution of the inputvariables about the selected nominal values. Theo-retically, there is no difficulty in solving this problem,but practically, the functions are too complicated tocomplete the necessary analysis.

POSSIBLE APPROACHES FORUNCERTAINTY ANALYSIS

Because it is impossible to complete the necessarytheoretical analysis, an alternate approach is necessary.A Monte Carlo simulation can be used as the basis fordetermining the uncertainty in stored energy and thecontribution of each input variable uncertainty. Thisprocedure entails assuming a probability distribution foreach of the input variables, randomly sampling fromeach distribution, running the computer code at theseselected values to determine the stored-energy value,and repeating the sampling process until enough stored-energy values have been generated to determine thestored-energy probability distribution. To obtain in-formation on the percent contribution of the inputvariables, it is necessary to save all the input variablesused in the Monte Carlo process as well as the computedstored-energy values. Another approach is to ana-lytically simplify the model by finding a Taylor's seriesexpansion of the model. The uncertainty analysis isthen completed on this approximation by analyticallydetermining the probability distribution of storedenergy. Alternately, an empirical approximation may be

Page 83: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

76

found by using a response surface approach. Here somefactorial combination of input variable values are runand the stored-energy response is modeled as anempirical polynomial fit of the input variables. Areasonably good fit would be expected, for only aportion of the range of the input variable would beused. A final approach is to create a physical approxi-mation to the model by creating a simple model basedon physical principles applicable in nuclear fuel rodmodeling. This simple model could be considered as amodel that does not include all the embellishments ofthe more complicated model.

The simple physical model approach was chosen forthe stored uncertainty analysis. The cost of repeatedlyrunning GAPCON-THERMAL-3 precluded using theMonte Carlo approach. The iterative nature of themodel and its expression in implicit functional formmade a Taylor's series expansion intractable. Although aresponse surface approach was feasible, it was felt thatits empirical nature limited it in conveying the informa-tion to the application personnel. The simple modelprovided the nuclear engineers an opportunity to definea simple physical model by using the knowledge thatthey had gained from the complicated state-of-the-artmodel. In particular, parameterizations of iterativelydefined submodels were defined with the understandingthat constants could be defined on the basis of thecomplicated model output. Hence, not only was asimple model defined, but it was tied to the compli-cated model through some of the parameterizations.

The simple model also reduced the number of inputvariables that needed to be considered. The specifica-tion of the uncertainty in the input variables was alsoconsequently simplified. Generally, the variables ex-cluded were implicitly accounted for by the parameter-izations of some of the submodels.

STORED-ENERGY SIMPLE MODEL

The simple model for stored energy uses the samebasic principles as GAPCON-THERMAL-3. The storedenergy in a fuel rod at any time is a function of thecurrent average tempsratures in regions of the rod:water film, cladding, fuel-cladding gap, and fuel. Thetemperature calculations begin with the bulk coolanttemperature and advance radially inward to the fuelcenterline temperature.

The first step in calculating stored energy is tocalculate the temperature rise across the water film. Fora cylindrical fuel rod with a constant water-filmheat-transfer coefficient, the Newton rate equation maybe written as

where hp is the average value used by the full model.Using the full code-supplied value of hp and anappropriate uncertainty for that value, the assumptionof a constant hp has little effect on the uncertaintyanalysis.

The temperature rise across the cladding is calculatednext by using the Fourier rate equation for a cylindricalfuel rod with constant-cladding thermal conductivity:

T _<lHdco/dCi)c ~ 2nKc

The appropriate value for Kc is the average thermalconductivity from the problem and from the full model.This value provides a temperature rise across thecladding equivalent to that predicted by the full model.As with hp, assuming a constant thermal conductivityhas little effect on the uncertainty analysis, providedthat an appropriate uncertainty is applied to Kc.

The cladding inner surface temperature is the sum ofthe bulk coolant temperature and *he water-film andcladding temperature rises:

TC=TW

The temperature rise across the fuel-cladding gap, likethe temperature rise across the water film,is also basedon the Newton rate equation and may be written as

qAT=^rHowever, it is difficult to directly assign an uncertaintyto the gap conductance. Therefore, propagation is alsoapplied to the gap conductance calculation. An iterativesolution is required, because thermal properties aredependent upon the average gap temperature, which, inturn, requires that the fuel-surface temperature besolved simultaneously:

Ts = Tc + ATg .

The gap conductance is specified to be

The first term in the gap conductance equation is the"ideal" model for heat conduction across an open gap.

Page 84: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

77

The effective gap width includes the physical gap andtemperature jump distance. These, in turn, include fuelthermal expansion, fuel relocation, fuel- and cladding-surface roughness, and the thermal accommodationcoefficient of the fill gas. The second term accounts forheat flow across the gap caused by solid contactbetween the fuel and cladding. This contribution to thegap conductance is primarily dependent upon the fuel-and cladding-surface roughness and the interfacial pres-sure.

Several simplifications are applied to the gap calcula-tion so that the uncertainty analysis may be more easilyapplied. First, a single function is used to predict elasticchanges in the fuel cross-sectional area as a function offuel-surface temperature and linear heat rate:

Fp=a1 +a2q+a3Ts +a4q2 + a52f .

This equation simplifies the elastic change in fuel arearelative to the fuel area at zero power and operatingcoolant temperature. The coefficients are determinedby applying regression analysis to the fuel cross-sectional area, the fuel-surface temperature, and thelinear heat rate obtained from the full model.

Second, inelastic changes such as fuel relocation andfission-gas swelling also affect the effective gap widthbut remain difficult to propagate uncertainties through.The inelastic changes mentioned are accounted forby a single parameter. This parameter, Fpr, is theratio of the fuel area at fuel-cladding contact to the fuelarea at zero power, the fuel area at zero power beingdependent upon the inelastic changes. The value for thisparameter is selected so that the gap conductancecalculated by the simple model matches the gapconductance calculated by the full model. Because Fp

and Fpr dea' with fuel area, their square roots may beused to.de'armine the physical gap size:

rp=rf(VFp~r-\/Fp~)+R.

Fuel-cladding contact and contact conductance for theuncertainty analysis are also determined from these twovariables. If Fp is greater than Fpr, then the fuel hastried to expand farther than the cladding inner surfacewould allow, and contact occurs. The contact con-ductance is then specified to be

Hs = b(y/Fp~-\/Fp~r),

where the coefficient, b, is dependent upon the assumedmaximum contact conditions.

Fission-gas release and fill-gas composition are notcalculated by the simple model. The fill-gas compositionis obtained from the full model. Fill-gas thermalconductivity for that composition is then calculated inthe analysis as a function of the average gap tempera-ture. Uncertainty is assigned to the gas thermal con-ductivity and not to the fill-gas composition.

Once the fuel-surface temperature is determined, thefinal temperature calculation is for the radial fueltemperature profile. To account for the temperaturedependence of the fuel thermal conductivity, theintegral conductivity method is applied to determinethe fuel centerline temperature:

As with previous input variables, the same thermalconductivity and flux depression used in the full modelis used in the simple model integral equation.

Rather than break the fuel into regions and calculatethe total stored energy from the sum of the regions, asingle calculation for stored energy is made based on thefuel average temperature. Because the radial tempera-ture profile is parabolic, the normal double integral forstored energy may be reduced to

ES = CP(TA)[TA-Tw]pirdfl4.

This is the stored energy, above coolant temperature,for the fuel.

The simple model for stored energy is a simplesequence of even simpler submodels. This propertygreatly simplifies the standard linear propagation oferrors, as the propagation may be applied to eachsubmodel. Each submodel is a different function of asubset of the original input variables and selectedintermediate variables from previous submodels. Thelinear propagation is completed in terms of thesevariables. Use of these variables, accompanied by theuse of relative variances, greatly simplifies the somewhatcomplicated propagation formulas. Following throughthe error propagation to stored energy results in aprocedure for determining the uncertainty of storedenergy, given that the model is correct and the specifieduncertainties for the input variables are valid. Thisprocedure provides the information to answer the firstquestion of application interest. An expanded discus-sion of the simple model is given by Cunningham et al.[3]-

A useful piece of information is the percent contribu-tion to the uncertainty of an output variable attrib-

Page 85: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

78

utable to a particular input variable. This percentcontribution may be determined even though the inputvariable is not directly used in the calculation of theoutput variable. The percent contribution to stored-energy uncertainty from neutron flux depression is anexample. At each stage of the uncertainty calculations,the percent contribution to the relative variance of thevariable can be determined from each term in theuncertainty equation. From the previous stage, thepercent contribution from the relative variance of theinput variables has been determined for each of thecurrent terms. It is then a matter of appropriatelydistributing each percent contribution to the presentstage to the input parameters and then recombining[3].

EXAMPLES OF STORED-ENERGYUNCERTAINTY

One use of the simple model to determine stored-energy uncertainty presents the change in uncertaintiesas a function of power under beginning-of-life condi-tions. The assumed values and uncertainties of the inputparameters are listed in Table 1. The resulting uncer-tainty estimates for fuel temperatures and energy storedas a function of power are plotted in Fig. 3. Thespecification of input uncertainties is not an easy task.After considerable discussion it was determined that theengineer's estimates were similar to three sigma limitson a Gaussian distribution.

In this use of the model, the relative uncertainty forstored energy is almost constant, although the relativeuncertainty for fuel temperature rises with power. The

100 200 300 400 500

LINEAR POWER, W/cmFig. 3 . Beginning-of-life uncertainties as a function of lineaipower foi a typical boiling-water reactor fuel rod.

explanation for this behavior is that even though thevariance and covariance for the temperatures increasewith power, the partial derivative terms are decreasingwith increasing power and temperature. These two

Table 1. Summary of input for uncertainty vs linearpower under beguining-of-Hfe conditions

Parameter

Coolant temperatureGadding outer diameterCladding inner diameterFuel initial diameterWater-film coefficientCladding thermal conductivityRoughnessLinear poweiGas thermal conductivityFuel thermal conductivity .Fuel heat capacityFlux depressionAccommodation coefficientEffective gap width

Value

513 K12.776 mm10.947 mm10.719 mm14.2 X 104W/m2-K16.0W/m-K25.4 X 10"4 mm

Percentageuncertainty, 3a

10.20.20.2

151510102

1055

10025

Page 86: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

79

effects offset each other and produce the observed,nearly constant relative uncertainty for stored energy.

Another use of the simple model regards a boiling-water reactor fuel rod during an assumed operatinghistory: A two-cycle history is assumed, with bothcycles beginning at a peak power of 33 kW/m. The rodpower during each cycle is then reduced according to adepletion curve for 3% enriched U0 2 . To obtain thenecessary time-dependent input parameters for thesimple model, the fuel rod performance code GAPCON-THERMAL-3 was run with the assumed values forgeometry, material property, and the operating historydescribed above. The resulting input values used in thesimple model are shown in Table 2, and the temperatureand stored-energy uncertainty predictions are plotted inFig. 4. The general trend of increasing relative uncer-tainty with burnup is primarily a result of increasinginput uncertainty with burnup.

This example of simple-model use is notable for thelarge decreases in relative uncertainty at the start andend of the second cycle. At these times the simplemodel and the full model predict that the fuel andcladding are in contact. For the uncertainty calcula-tions, the simple model assumes that when the fuel andcladding come into contact, gap conditions are betterknown and uncertainties for fuel temperatures arereduced. A Monte Carlo simulation using the simplemodel and the assumed Gaussian distributions for theinput variables showed that this discontinuity is arti-ficially produced by the simple model. During the twoobserved contact periods, the simulation did not predict100% contact and did not show the discontinuity

6000 8000 10.000BURNUP. Mwd/MTM

12.000 14.000

Fig. 4. UncerUintiei u a function of bumup for a typicdboiling-water reactor fuel rod.

(Fig. 5). The simple model at this stage is two differentmodels depending upon whether or not contact takesplace. The uncertainty analysis does not account for theuncertainty of contact. That is, model uncertainty isnot included in the analysis.

Any uncertainty analysis should resolve which inputor intermediate variables significantly contribute to the

Table 2. Summary of input for uncertainty analysis ofa boiling-water reactor fuel rod"

BurnupMWd/MTM

01,8503,6005,2406,7606,8008,590

10,26011,78013,14014,340

Linear power(kW/m)

32.830.829.027.125.332.830.227.725.122.620.0

Gas mixture

He

10099979490908579757067

Kr

00011123344

(%)

Xe

013699

1318222629

Gas mixtureuncertainty

(%)

2.03.96.79.7

12.612.615.717.720.723.025.8

Fuelrelocation*1

(%)

050.561.568.978.269.866.673.579.188.393.9

"In addition to input listed in Table 1.'Decrease in hot gap aze after accounting for fuel thermal expansion.

Page 87: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

2000 4000 eooo 8000 IO.OOOBURNUP. MWd/MTM

12.000 14.000

Fig. 5. Comparison of linear propagation and Monte Carloestimates of uncertainty for power history.

output variable uncertainty. The calculation of thepercent contribution of each of the input variablesprovides this information. Table 3 summarizes thepercent contribution to the variance of total storedenergy. Notice that the percent contribution of the

Table 3. Contribution of input variableuncertainty to total stored-energy

uncertainty

Variable

Linear powerFuel thermal conductivityEffective gap widthGas thermal conductivityInitial dimensionsCoolant temperatureWater-film coefficientCladding thermal conductivityFuel depressionFuel heat capacityAccommodation coefficientRoughness

Beginningof life

24.56.1

46.70.00.10.00.00.58.88.84.40.1

Endof life

32.722.4

1.20.90.00.00.00.9

24.816.80.20.1

input variables differs between the beginning-of-life andpower history examples.

SUMMARY AND CONCLUSIONS

A method for studying the uncertainty of an outputvariable of a complicated model has been given andillustrated for nuclear fuel rod stored energy. The use ofa simple model to approximate the complicated fullmodel provides an opportunity to use simple linearerror propagation techniques and to reeva?uate theimportant components or stages in the complicatedmodel. The tying of the simple model to the full modelthrough parameterizations opens up a number ofsimpler submodels that may be considered while keep-ing the model based on physical principles.

The linear propagation on the simple model allows thepercent contribution of each of the input and inter-mediate variables to be determined for the outputvariable. This information may then be used to deter-mine wheie the emphasis should be placed to reduceoutput variable uncertainty. It is also valuable inmodel-validation work, for it identifies input variablesthat need to be precisely determined for the experi-mental program. If these are precisely measured, thenany differences in the model predictions and theexperimental data may be attributed to model in-adequacies and not to differing input conditions be-tween the model and the experiment.

The limitations of the method include that it docsrequire a simple model based on physical principles tobe constructed. This may not always be possible. It alsorequires the specification of the uncertainties for theinput variables. This is always a difficult task and shouldbe done with care and with all available information.The power history example also illustrates the problemsthat may arise if two different submodels must be usedat any stage. The uncertainty calculations do notaccount for the uncertainty in which submodels shouldbe used. In a broader context, the method does notinclude any uncertainty caused by uncertainty in themodel's correctness.

ACKNOWLEDGMENTS

This work was funded under the auspices of the FuelBehavior Research Branch of the U. S. NRC and wasundertaken with the guidance and support of W. V.Johnston and H. H. Scott of that NRC agency. Wewould also like to thank C. R. Hann and D. D. Lanningof Pacific Northwest Laboratory for their helpfulcomments.

Page 88: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

81

REFERENCES

1. D. D. Lanning, C. L. Mofar, F. E. Panisko, and K. B. Stewart, GAPCON-THERMAL-3 Code Description,PNL-2434 (1978).

2. G. A. Mihram, Simulation, Academic Press, New York, 1972.3. M. E. Cunningham, D. D. Lanning, A. R. Olsen, R. E. Williford and C. R. Harm, Stored Energy Calculations: The

State ofthe Art, PNL-2581 (May 1978).

Page 89: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Use of 3" Parallel Flats Fractional Factorial Designsin Computer Code Uncertainty Analysis

Donald A. Anderson and Jack Mardekian

Department of StatisticsUniversity of Wyoming

Laramie, Wyoming

Dale M. Rasmuson

EG&G Idaho, Inc.Idaho Falls, Idaho

ABSTRACT

Parallel flats fractions for the 3" factorial provide near orthogonal designs for which sets oftreatment combinations (flats) can be run sequentially with analysis after each flat. For the 3 n

factorial the number of parameters involved including the mean, main effects, and two-factorinteractions is

1 + 2« + 4^-j) = 1 + 2«2 .

In most applications a relatively small number of these will be significant, and a design that willproduce a good fit in relatively few runs is desired. The concepts and the sequential analysis areillustrated using a 3 4 example and a 3 6 example. Potential areas of application of these designs arediscussed.

GENERAL BACKGROUND

The assessment and influence of uncertainty in valuesof input or independent variables upon an output(response variable) has received much attention inrecent years. This is especially true in the nuclearindustry where concern is directed toward the effect ofuncertainty in input variables to large computer pro-grams upon the values calculated for several reactorcharacteristics. This was a main topic of interest at theFirst ERDA Statistical Symposium in 1975 [1].

The main idea in most of the methods that have beenproposed to analyze trie above problem is to selectvalues for each independent variable and obtain thecorresponding value for the response variable. Anothervalue is selected for each independent variable, and anew value is obtained for the response value. Thisprocess is repeated until the desired number of values of

the response variables have been obtained. Regressiorftechniques are then used to fit a surface to the set ofvalues. This fitted surface is considered as an approxi-mation to the true surface of the response variable.

The values of the independent variables are usuallyselected either at random (e.g., Monte Carlo techniques)or in a systematic manner (e.g., experimental designs).Cox [2] reports the results of a study comparing theMonte Carlo approach and the experimental designapproach. A comparison of three sampling methods isdiscussed by McKay et al. [3].

The general approach in using an experimental designhas been to choose a "suitable" 2" fraction and thenadd star points so that quadratic effects could beestimated as well as the linear effects and certain of thetwo-factor interactions. The general model to be esti-mated for this case is of the form

82

Page 90: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

83

y = 0O + £ ft**+ £ L

1=1 1=1 I < j

In the above mode], it is assumed that a second-orderresponse surface in the independent variables is anadequate approximation to the true surface. That is, allthree-factor and higher order interactions are assumednegligible.

A disadvantage of the use of 2" fractions is that all ofthe runs or treatment combinations must be madebefore an iinalysis can be performed. If the number ofparameters is large, then the number of runs required toestimate all of the parameters may be prohibitivebecause of time and cost considerations. Sometimescompromises are made by assuming that some of thetwo-factor interactions are zero. This reduces thenumber of runs needed to estimate the parameters ofinterest.

In this paper, we assume that (1) the response variableof interest is a function of several independent variables,(2) nonlinearity in most of the independent variables isexpected, (3) the number of parameters to be estimatedis large, (4) the cost of experimentation is high, (5)observations raay.be taken- sequentially one at»time or—in small groups such that analysis is possible after eachobservation or group of observations, and (6) sometwo-factor interactions are present, but no a priorirationale exists for excluding them from consideration.

It would be desirable to (1) design the experimentsequentially to coincide with the way observations areactually made, (2) develop a "sequential analysis" thatwould direct the selection of treatment combinations asa function of the previous data, and (3) determine a"stopping rule" that would terminate in the minimumnumber of runs.

The methods of construction considered by Andersonand Mardekian [4] and Mardekian [5] are for s»factorial designs where s is a prime number or a powerof a prime. These designs are developed for sequentialapplication and analysis of treatment combinations insmall groups. Each group of treatment combinations iscalled a flat, and a whole design is called a parallel flatsfraction. The objective of this paper is to describebriefly the parallel flats fraction and the sequentialanalysis for the 3" factorial. The designs and theanalysis are illustrated by two numerical examples. Thefirst of these is a very simple example involving onlyfour factors and relatively few interactions. The secondinvolves six factors, several two-factor interactions, andsome higher ord^r interactions.

BACKGROUND FOR 3" DESIGNS

The designs discussed in this paper will have eachfactor at three levels. A treatment combination for the3" experiment is denoted by an « X 1 vector t withelements from {0,1,2} indicating the levels of the nfactors. A fractional factorial design is some collectionof N treatment combinations, denoted by T. Specifi-cally, the symbol T is used to denote either the set oftreatment combinations in the design or the B X J Vmatrix having columns t i , t2. . . . , t;v. In the experi-ment, observations are made at each treatment combi-nation in T. The observation corresponding to t isdenoted by yt, and the vector of all N observations isdenoted by an N X 1 vector y.

For convenience of notation, let T= DM= [d\,d2, . • • , dfl], so DM (M referring to the main effectdesign matrix) is an NX n matrix with rows t j ,t'2 tjv- For i - 1, . . . , n, the column dy of DMcontains the levels for factor F,-. If for any i and/ ninteraction between factor F,- and vector Fj is to beincluded in the model, the two columns

d,v+ d/ and d, + 2d; (mod 3) 0)

must be adjoined to Dm; and, in the usual notation, thecorresponding interaction effects are denoted by FjFf,

This procedure is followed for every pair of factorsthat interact, and D represents the design matrix thatresults after all appropriate columns corresponding tointeractions have been adjoined to DM. For example, ifall pairs of factors interact, there will be 2(9j) columnsadjoined to DM to form the matrix D.

Because each factor appears at three levels, each maineffect symbol, F/, i = 1, . . . , n, denotes two degreesof freedom. Similarly, if factors Fj and F}- interact, eachof the two symbols FjF*, x= 1,2, denotes two degreesof freedom, accounting for all 2 2 = 4 degrees offreedom associated with the interaction. These singledegrees of freedom are defined by the set of orthogonalcontrasts

L Qr-i n

H= 0 - 2

Li iJ(2)

where the rows of H correspond to levels 0, 1, and 2respectively.

Page 91: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

84

Using the usual notation, we write the model equa-tions E(y)= AT. The parameter vector F includes thegeneral mean n along with two single-degree-of-freedomcomponents for each main effect and four single-degree-of-freedom components for each two-factorinteraction. The first column of the design matrix X is acolumn with every element one and corresponds to ft.Since each column of D corresponds to two degrees offreedom, linear and quadratic, each column of D istransformed into two columns in the X matrix. This isdone by replacing each element in D by the corre-sponding row of H in Eq. (2).

The regular 3"~r fractions of the 3" factorial aresolutions (mod 3) to

(3)

where A is an r X « matrix of rank r and c is r X 1, bothover{0, 1,2} (mod 3). It is an easy task to write thealias sets from Eq. (3). Geometrically, we may regardthe 3" treatment combinations as points in the Euclid-ean geometry, EG(n,3), of order n over the field oforder 3. In this interpretation, Eq. (2) represents alinear (n - r) flat of 3" ~ r points in EG(«, 3).

An alternative to choosing points on a single flat inEG(«, 3) is to consider the union of points on severalflats. Thus, consider the flats generated by equations

= c,-, i= 1, 2, . . . , f, (4)

where A is r X n of rank r and c,- is r X 1. The design Tcorresponding to Eq. (4) is

The /th flat contains 3" ~r points, and the whole designT contains f-3"-r points. The structure of the designnaturally partitions the treatment combinations intogroups of size 3" ~r which may be applied and analyzedsequentially. The design T is called a parallel flatsfraction and will be denoted symbolically by

At = C= [clt c2 cf] .

PARALLEL FLATS FRACTIONS FORTHE 3" FACTORIAL

(6)

A 3" parallel flats fraction T consists of the union oftreatment combinations obtained as solutions to

where A is r X n of rank r. The fraction will besymbolically denoted as At= C where C-(ci, C2, . . . , c/). Without loss of generality it may beassumed that/I is in the form

= [A*\Ir]. (8)

There are 3n~r solutions to At = c;- (mod 3), and,hence, the fraction T contains N = f-3"~r treatmentcombinations. With A in the form of Eq. (8) thesesolutions are easily expressed as follows: Partition t as

t' = (rj, . . . , tr\tr+1, . - . , r « ) ' = (/'i l t '2) .

Thus t2 = Q - A *t\, and the solutions are

where tj takes all 3" ~r possible values.The effects are partitioned into 1 + ( 3 " - r - l)/2 =

1 + » alias sets,So, Si> • • •, Su, which are completelydetermined by the matrix A. The set SQ, containing thegeneral mean ju, carries one degree of freedom, and eachof the remaining sets 5,-, i= 1, 2 u, carries twodegrees of freedom (linear and quadratic) for each flat.Let Soc, S\c,..., Suc denote the combined "effects"of alias sets fox At = c for s given c. The 3"~r treatmentcombinations from any single flat At= c provide anorthogonal fraction for estimating these 3n~r effects.The estimate of SQC is, in fact, a linear combination of nand the linear and quadratic effects of each effect in So.Similarly, the linear and quadratic effects of 5,c, / =1, 2, . . . , u, are linear combinations of the linear andquadratic effects of every effect contained in Sj. Theexact form of these linear combinations is determinedby c. Thus, in a parallel flats fraction, the alias sets aredetermined by A, and the various linear combinationsmaking up the S,c are determined by c.

Suppose E\ and Ei are two effects, main effect orinteraction, belonging to alias set S{,i= 1, 2, . . . , «.Then, for any flat, the levels of E% are related to thelevels of E\ by one of the permutations in the symmet-ric group (S on three symbols,

« = e , (012), (021), (01), (02), (12) (9)

= ct (mod3), i = l, 2, . . . , / , (7)

where e denotes the identity. Each effect in the set S,- isrelated to the "effect" 5,c by one of the elements of &The form of the linear combination relating effect 5/c

to effects E e 5,- depends only on this relation.

Page 92: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

85

Define the following six matrices as follows:

[J G | / 2 -1/2J

00)

_ r-1/2 -3/- [ 1/2

-3/21_ 1 / 2 J

r 1/2 --3/2]_ 1 / 2 JSuppose that 5 ,= E\, Ei, ..., E\ and that the ef-fects in the set S\ are related to the effect 5,c bygit ?2> • • • t 81, respectively, in (S. Then in the flatAt=c,

Ga . . . G o (11)

ElEl

where Ggj is one of the matrices defined in Eq. (10).The relationship in Eq. (11) provides the basis for the

sequential analysis. After observations from the /th flatAt = cy are made, estimates

- A

Sic/

£S2

UCj

are computed. Differences of estimates from the/th flatand kth flat are computed for all k<j. That is, wecompute

(12)

for i = 1, 2, . . . . «, and Ar =A 1, 2, . . . , / - A 1 .A nonzero difference in 5, c . - 5,Cjt or 5?c. - 5?

indicates that at least one of the effects that appear indifferent linear combinations between the kth and /thflats must be nonzero. Conversely, if

either all effects that occur in different linear combina-tions between the/th and kth flats in the ith alias set arezero, or the extraordinary event of several nonzero ef-fects accidently combining to zero has occurred. Thissecond event is so unlikely that one might infer, at leasttentatively, that the effects themselves are zero. Thisbrief description of the analysis is illustrated with nu-merical examples in the next sections.

A FOUR-FACTOR EXAMPLE

To illustrate the design construction and analysis, weconsider the function

v = 5 + (2 -7 ln(2 + 2x3 +x\)

03)

0 < x , < 2 , 7=1,2,3,4.

Given the function in Eq. (13), it is clear that factors F\and F3 interact and factors F3 and F4 interact. Noother interactions exist. It is important to observe thatwhenever terms involving the products of nonlinearfunctions of two or more variables exist, all degrees offreedom for the corresponding interactions will be non-zero. A second-order response surface design will not, ingeneral, be adequate to approximate such a function.

The A matrix of Eq. (7) used to construct a design forthis example is

[l 2 0 1J ' (14)

It is easy to show that the abas sets corresponding tothis .4 matrix are

(15)

In this case, n is estimable. The effects5,c will be iden-tified as the main effect in the set 5,-, i = 1,2,3,4.

Three flats were required to resolve the significanteffects in this example. For compactness all the data arepresented in Table 1; however, in practice the flatswould be chosen sequentially.

Data from each flat are analyzed separately to esti-mate jti and the linear and quadratic effect of each alias

Page 93: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

86

Table 1. Design variables and data for example

Flat 1.

x l

00011122•>

At.

012012012

-

*3

021210102

QX4

0I2201120

728

215393

1815286634

y

.13

.49

.66• 99.36.43.26.24.66

2,

X1

000111222

At

X2

012012012

. -

X3

021210102

dX4

120012201

8386

15242467

2221246

y

.84• 39.35.24.78.31• 15.64• 50

3.

x i

000111222

At

X2

012012012

. -

X3

102021210

[3X4

201120012

2118

3111

39421302969

y

.66

.13• 49.43•99• 36.65.26.24

set. Because ju is estimable, its value is constant fromflat to flat at /i = 89.8. The estimates for the effects ofthe other four alias sets are given in Table 2. The formof the relationship between 5,c and effects in the set isexpressed only by the appropriate permutation symbolthat corresponds to the appropriate matrix of Eq. (10).

The estimates from flat 1 alone simply indicate thateach alias set has some nonzero effects. Comparisons ofestimates from flats 1 and 2 indicate the following:

1. A difference in abas set Sy implies that at least oneof F2F4 or F3F4 is nonzero.

2. A difference in alias set S2 implies that at least oneof F jF 4 orF3F4 is nonzero.

3. An equality in alias set S3 indicates that bothFjF^and F2F4 equal zero.Combining information from 1-3, we might suspectF1F4 negligible (since F^F^ indicated zero) andF2F5 negligible (since F2F4 indicated zero). This isconsistent since interactions with factors F 3 and F 4

occur in both Sj and 5 2 , and we suspect they inter-act.

4. A difference in 5 4 , even though small, indicates thatat least one of FXF\,FXF\, orF2F3 is nonzero

Table 2. Estimates of effects foi sample data

FlatFlatFlat

FlatFlatFlat

FlatFlatFlat

FlatFlatFlat

123

123

123

123

<F,

e

<f'2

eee

<F3

eee

<F4

eee

F2F3

(12)(12)(01)

F1F3

(12)(12)(01)

F . F 2

e

(021)

F , F 2

(021)(012)

'A

(021)(012)

F1F4

(0*2)(021)

F . F 4

(021)e

F - 1(021)

e

F3F4>

(0*2)e

F3F4>

(012)(021)

F2F4>

(012)(012)

F2F3 }

(021)(021)

* .

-20.4-21.5-20.4

*2

-27.3-21.0-21.9

*3

61.461.461.4

h

102.6103.9102.6

•1-26

25-26

S l26

-25-27

n[%323132

.4

.5

.4

.1

.7

.2

: *

.9

.59

Page 94: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

87

Because F2F3 in 5i> ^1^3 i n S2> and F jF 2 in S3 wererelated the same in both flats 1 and 2, we cannot makeinferences concerning these. The next flat chosenshould result in different relationships for each of these.

Consider flat 3 and comparisons of estimates withflats 1 and 2.1. An equality between flat 3 and flat 1 in alias set 5 j

implies that F2F$ and F2F\ are negligible. A differ-ence between flat 3 and flat 2 in alias set Sl nowconfirms that F3F4 is nonzero.

2. Differences in alias set S2 of flat 3 with both flats 2and 3 indicate only that at least one of the threeinteractions present is nonnegligible.

3. An equality in alias set 53 of flat 3 with both flats 1and 2 indicates that all three interaction effectsF^F2, F j F 4 , and F2F4 are negligible.

4. An equality between flats 3 and 1 in alias set 5 4

indicates that both FXF\ and F2F\ are negligible.Therefore, we infer that F1F3 is nonnegligible.

Because FXF\ is nonnegligible and FXF\ has been in-ferred as zero, we conclude in alias set S2 that bothFjF3 and F 3 F 4 ire present.

After only three flats of nine runs each, it is possibleto conclude with reasonable certainty that factors F jand r 3 interact and that F 3 and F 4 interact. All otherinteractions are inferred to be zero. The intercept ju, allmain effects, and those two interactions (four degreesof freedom each) are estimable if all other interactionsare inferred negligible. Thus, we have made inferencesconcerning 33 parameters with 27 runs.

Using the 27 runs in our design, we can estimate apolynomial approximation to the functional relation-ship in the original x variables. This polynomial shouldbe a reasonable approximation to a Taylor's series ex-pansion of the true function about the central value (1,1,1,1). Given the joint distribution of Xj, x2, JC3 , JC4 , itis possible to compute approximate moments and othercharacteristics of the distribution of j .

If we had been unable to make inferences concerningall of the parameters in the model, one more flat wouldhave been needed. This process would continue untilfinal resolution occurs. It should be pointed out thatthe design used will always terminate in three flats if thenumber of interactions is two or less.

A SIX-FACTOR EXAMPLE

The data for the example of this section were pro-vided by one of the authors (Rasmuson) and sent to theother two who had no knowledge of the data source.Data were provided sequentially as required and theanalysis conducted as described in the previous section.

On the completion of the analysis, the model fromwhich the data were generated was provided [6].

The A matrix of Eq. (7) used to construct the designfor this example is

A =

" 1 1 1 0 0 01 2 0 1 0 01 2 0 0 1 01 1 0 0 0 1

The total number of parameters in a model containingmain effects and two-factor interactions is 73 = 1 +2.6 + 4(2) . These effects are partitioned into two de-gree-of-freedom alias sets as indicated in Table 3. Thefirst alias set containing the mean is given in Table 4,and the effect 5O c is identified with the mean. Theeffects Sic are identified with Fh 1= 1, 2, 3, 4. It isnoted that alias sets 5 3 and <S4 contain two main ef-fects; hence, with any single flat these effects are con-founded.

Table 3 contains estimates 5,c , i = 1 ,2 ,3 ,4 , for theseven flats generated by

0 1 2 0 0 1 1

0 2 1 1 2 0 1

0 1 2 0 1 1 2

0 2 1 2 1 1 0

A\ =

As in Table 2, the symbols e, (012), and (021) eachrepresent a 2 X 2 matrix from Eq. (10). The estimates5oc are given in Table 4, and the numerical values listedbelow the factorial effects give the exact linear combina-tions of these effects.

Differences between estimates from flats 1,2, and 3in all alias sets indicate the presence of several inter-actions, although in sets S3 and S4 the difference couldbe due to main effects. Several statements of the form" . . . at least one of the following effects must be non-negligible . . . " could be made from the first thre« flats.

Consider the estimates from flat 4. In the SQ set theestimates between flats 3 and 4 are "very close," whichindicates that the interaction effect F 4 F | (both degreesof freedom) is negligible, and that F^F\ is definitelynonzero. The estimates (both linear and quadratic) be-tween flats 3 and 4 in alias set S2 are "very close,"indicating that effects F^F^, FXF5, F^F^, F$F\, andF^Ffc are negligible. A similar argument from the thirdalias set comparing flats 3 and 4 indicates that FXF2,F^F^F^l, and F2F4 are negligible. Note that this is a

Page 95: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 3. Estimates of effects from 3 6 example

FlatFlatFlatFlatFlatFlatFlat

FlatFlatFlatFlatFlatFlatFlat

FlatFlatFlatFlatFlatFlatFlat

FlatFlatFlatFlatFlatFlatFlat

1234567

1234567

]

234567

1234567

eeeeeee

(F2

A

Ceeeeee

{F3g

eeeeee

<F4

eeeeeee

F2F3

e(021)(012)

ee

(021)(021)

F1F3g

(021)(012)

ee

(021)(021)

F6

(012)(021)(021)(012)

e(021)

F5

e(021)(012)(021)(021)(012)(012)

F cZ2 4e

(012)(021)(021)(012)

e(021)

F1F4

(021)(012)(012)(021)

e(012)

F1F2

.(021)(012)

ee

(021)(021)

h l F 2

e(012)(021)021)

(012)e

(021)

F2F5

e(021)(012)

e(021)(021)(012)

F1F5

(012)(021)

e(012)(012)(021)

¥*•

ee

(021)(012)(021)(012)

f i ieee

(021)(012)(021)(012)

F2F6

e(012)(021)(012)(021)(021)

e

F1F6

(01?)(021)(012)(021)(021)

e

'A(012)(021)

e(021)(012)

e

¥ie

(021)(012)

ee

(021)(021)

F3F4

eee

(012(021(012(021

F F2

(021](012(021(012(012;

e

F2F4

(012)(021!(012)(021)(021)

e

F F2F2F3

e(021)(012)(021)(012)(012)

e

F3F5

e(021)(012)

e(012)(021)

e

fAeee

(021)e

(021)

F2F5

eee

(012)e

(012)

vSeee

(012)(021)(012)(021)

F4F6 F5F6>

e e(012) e(021) e

ee

(012)(012)

021)021)021)021)

V l F5FI>e (012)e (021)

(012) (021)(021) e(012) e(021) (012)

F3F6>

021)012)012)021)e

(012)

F4F5>

e(012(021(012(012(021(021)

51

18.6- 9.6-19.931.8

- .9- 3.727.1

h14 Q

21.020.720.712.315.118.0

§3

-350.1-500.0-567.8-568.1-503.5-350.5-571.0

h39.5

-16.4-16.0-18.1-16.4-10.2-16.0

i\- 2.9

13.9-14.3- 3.4-12.311.03.1

41 1 7

™ 11 • /-15.0-18.1-18.511.2

-12.114.0

81.196.5

165.8165.6103.080.5

170.9

S 11.2

19.9-22.621.719.9

-17.1-22.6

Table 4. Estimates of effects of 5 0 for 3 6 example

FlatFlatFlat

FlatFlatFlatFlat

1

2

3

4

5

6

7

1

1

1

1

1

1

1

F 3 F 6 2

-1

1

0

01

-1

0

F fF61

1

-2

-2

1

1

-2

F4F5

-1

0

1

00

1

1

F4%>

1

-2

1

-2-2

1

1

472.1

547.6

530.6

581.1

547.6

472.8

580.6

preliminary observation and that further information onthese effects may come in later flats which could changethis. However, these observations will guide our selec-tion of flat 5.

From Table 3 suppose we look only at the effects notsuspected to be zero in Table 5.

1. In the first alias set, F2F3, F2F\, and F3F5 arealiased ir: the same way in each flat.

2. In the second alias set, F2 is aliased with F^F\, ineach flat in the same way.

3. In the third flat, F3 and F^F5 are aliased in each flatin the same way.

4. All remaining effects in S4 are estimable.

The next flat should be chosen to break the commonaliasing on these effects as much as possible.

The fifth flat produces an apparent contradiction. Inthe alias set S4 , the estimates from flat 2 and flat 5 areidentical, which would imply that both F2F\ and F2F\are negligible. Most, but not all, of the variation in esti-mates from the five flats can be explained by the twomain effects. For example, flats 4 and 5 should producethe same estimates if only main effects are involved.The indication is to reconsider one or more effects in S4

which have been previously eliminated. Note that F±F5and FQFS appear at common relations in flats 2 and 5;thus, these are to be considered. The estimate of F4F^negligible has been definitely made from the 5 0 set.Thus it would seem reasonable to reenterFj/^ for con-sideration.

After five flats we have left in the model all maineffects and interactions FXF2, F2FS, F 3 F 5 , F3F6, and

Page 96: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

89

Table 5. Summary of effects remaining after four flats

Flat

12345

Flat

12345

Flat

12345

Flat

12345

<F1

eeeee

<F2

eeeee

<F3

eeeee

<F4

eeeee

F2F3

(021)(012)

ee

F6

(012)(021)(021)(012)

F5

(021)(012)(021)(021)

F F2

(021)(012)

(021)

i t p •, 1 5 ,i •

•,(o?2);

,(021),

',(012)!

:F iF2 :

!(02u!,(012),, e ,

;F"iT;

! (012)1(02D,(021),(012),

F2F6

(012)(021)(012)(021)

;(oi2)',

,(021),

!(021),'

F2F3

(021)(012)(021)(012)

F3F5

(021)(012)

(012)

VIeee

(021)

F2F5

eee

(012)

F2F6>

ee

(012)(021)

F5F6>

ee

(021)(021)

F5F6>

(012)(021)(021)

e

F3F6>

(021)(012)(012)(021)

18.6- 9.6-19.931.8

- .9

52

14.921.020.720.712.3

h-350.1-500.0-567.8-568.1-503.5

h39.5

-16.4-16.0-18.1-16.4

*f- 2.913.9

-14.3- 3.4-12.3

H-11.7-15.0-18.1-18.511.2

S*3

81.196.5

165.8165.6103.0

^4

1.219.9

-22.621.719.9

F5F6. The small deviations from exact fit which areobserved (for example, in the So set) indicate the pres-ence of some higher order interactions. This, combinedwith the apparent pattern of subscripts, might lead oneto reenter one more interaction, FXF5, since F^Fi andF2FS are included. (Note: the other natural interactionsF2F3 and F2F6 have been demonstrated to be zero.)

The model now contains the mean, six main effects,and six interaction effects for a total of 1 + 12 + 24 =37 degrees of freedom. These are all estimable from the45 runs of the design. A regression with these data pro-duces an R2 = 0.999.

There is, in this example, enough uncertainty aboutthe inferences made that two more flats were taken tocheck for further contradictions. None were found, andregressions on six flats and seven flats, respectively, pro-duced R2 values equal to the R2 above with essentiallythe same estimates of the coefficients.

As indicated at the beginning of this section, the anal-ysis was completed by Anderson and Mardekian with-

out prior knowledge of the data source chosen byRasmuson. The presence of three-factor interactionscomplicated the analysis. The function used to generatethe data is given by

25F5ln(1.3F6) 0.9F2exp(0A5F1)

y = ma\(3500,y),

where the levels of the factors were

Factor 0 1

3.1-1.3516.0-0.1511.953.35

41

173

125

4.92.35

18.06.15

13.056.65

Page 97: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

90

The interferences made after five flats, 45 observa-tions, are essentially correct so long as we restrict our-selves to two-factor interactions. The nature of the datadid lead to a suspicion of a multiplicative model; how-ever, the appropriate Tukey-type tests were not con-ducted. The polynomial fit to the function provides anapproximation to a Taylor's series expansion. Given ajoint distribution on the independent variables, approxi-mate moments ofy could be obtained.

In essence the sequential analysis has led to an infer-ence concerning 73 parameters with only 45 runs. Thesame conclusion would have been reached if N>73runs had been made and all parameters estimated be-cause the error variance is zero in this application. Thesequential approach cannot do worse than this sinceflats would be chosen to give estimates of all effectswhen A r >73. A second-order response surface ap-proach would not be satisfactory since the true modelinvolves products of nonlinear functions of the indepen-dent variables.

SUMMARY

This paper has presented a new class of designs inwhich

1. the experiment is designed sequentially to coincidewith the way observations are actually made,

2. the data are analyzed sequentially so that selectionof new treatment combinations is influenced by theprevious observations, and

3. a stopping rule exists that will terminate the proce-dure when its criteria are met.

The authors feel that these designs provide an alternateapproach to experimentation that can reduce the cost.These designs can be effectively used where no a prioriinformation exists about the way factors interact as wellas in the case where such information exists.

REFERENCES

1. W. L. Nicholson and J. L. Harris, Proceedings of the First ERDA Statistical Symposium, BNWL-1986, March1976.

2. N. D. Cox, "Comparison of Two Uncertainty Analysis Methods," Nucl. Sci. Eng. 64,258-65 (1977).3. M. D. McKay, W. J. Conover, and R. J. Beckman, "A Comparison of Three Methods for Selecting Values of Input

Variables in the Analysis of Output from a Computer Code," Technometrics 21: 239-45 (May 1979).4. D. A. Anderson and J. Mardekian, "Parallel Flats Fractions for the 3* Factorial," Inst. Math. Stat. Bull. 8: 223

(July 1979).5. J. Mardekian, Parallel Flats Fractions for the 3k Factorial, Ph.D. Dissertation, Department of Statistics, Uni-

versity of Wyoming, June 1979.6. G. P. Steck, D. A. Dahlgren, and R. G. Easterling, Statistical Analysis ofLOCA FY-75 Report, SAND-75-0653,

December 1975.

Page 98: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Estimating Residential Energy Consumption from the National InterimEnergy Consumption Survey: A Progress Report

Thomas H. WotekiEnergy Information Administration

Department of EnergyWashington, D.C.

ABSTRACT

The Office of the Consumption Data System of the Energy Information Administration is respon-sible for developing a comprehensive national-level data base on energy consumption. The first step inthat development, a multistage area-probability-sample survey of residential energy consumption (theNational Interim Energy Consumption Survey), was recently completed.

The survey is unique in that it is the federal government's first probability-sample-based empiricaldata on energy consumption. It is also unique with respect to the opportunities and problems inestimating energy consumption by end use. In this report, some of these methodological features andanalysis problems arc described. In particular, the sampling design is described, as well as the proce-dures for obtaining energy consumption data, the problems with missing and irregular consumptiondata and the way they handled, our present plans for estimating end use, and future plans for analysesand methodological improvements.

THE SAMPLING PLAN AND THE DATA

The National Interim Energy Consumption Survey(NIECS) is a four-stage area probability sample ofhouseholds in the United States. The sample was de-signed to provide accurate, reliable national-level esti-mates of energy consumption and related characteris-tics of households.

For the first stage of the design, a stratified randomsample of 103 primary sampling units (PSUs) was cho-sen, each PSU comprising a county or group of contig-uous counties. The 38 PSUs with the largest 1970 pop-ulation were selected with certainty, and they range insize from 1.1 million persons to 3 million persons andinclude the 25 largest standard metropolitan statisticalareas. The remaining PSUs were grouped into 65 stratarepresenting about 2 million persons each. The criteriafor stratification included geographic region, metropol-itan vs nonmetropolitan character, and population den-sity. One PSU was selected from each stratum, the

probability of selecting a PSU being proportional to itspopulation.

At the second stage, PSUs were divided into second-ary sampling units (SSUs) of about 2500 persons. Datafrom the 1970 census for small areas such as blockgroups and enumeration districts, supplemented bydata on post-1970 residential construction from the R.H. Donnelly Corporation, were used to construct theSSUs. Secondary units were selected within PSUs withprobability proportional to population. Use of newconstruction data for estimating population has the ef-fect of reducing the contribution to sampling variancedue to variations in the population of second-stageunits when the units arc chosen in this manner.

At the third stage, SSUs were divided into small geo-graphic segments, or third-stage units, of about 25households each based on census block statistics andrough field counts. Depending on the size of the SSU,one or more third-stage segments per SSU were chosenwith probabilities proportional to population. Finally,

91

Page 99: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

92

each household within a selected third-stage unit waslisted by detailed address and ten households were se-lected at random, without replacement, from amongthem. In all, about 4500 households were selected, ofwhich about 4300 or 95% cooperated in the survey.

Among the data collected in personal interviews withcooperating households are socioeconomic characteris-tics of the households, energy-related characteristics oftheir dwellings (such as square footage and extent ofinsulation), and data on energy consumption and costs.Actually, the consumption and costs data were ob-tained directly from the utilities and suppliers servingthe household. For each cooperating household, we ob-tained monthly or other periodic billing data for eachfuel regularly used by the household. Subsequently,heating and cooling degree day data were matched tothe billing periods of each of the fuels used by individ-ual households.

ESTIMATION TASKS AND MISSINGDATA PROBLEMS

The interim residential survey was designed to esti-mate energy consumption and the relationship betweenconsumption and other characteristics of households.Three basic problems that arise in this connection are

• estimating the total amount of energy consumed byhouseholds (exclusive of transportation) and totalsby fuel type,

• estimating the amounts consumed for various enduses, and

• developing performance coefficients such as Btu/degree day/square foot in the case of energy used forspace heating.

We anticipate that solving these problems will requiresome of our best efforts. The latter two are especiallychallenging because they will require disaggregating andadjusting data, using unequally weighted data, incorpo-rating features of the sampling plan into the estimationproblems, and estimating regression parameters at vari-ous levels of aggregation. Before any of these problemscan be dealt with, however, it will be necessary to solvetwo other problems: imputing missing data and, to alesser extent, standardizing data to a fixed calendaryear.

Ninety-five percent of the 4500 selected householdsagreed to a survey interview, and 90% of those whoagreed also signed waivers permitting us to obtain theirbilling data from utilities. Thus, for a number of house-holds, we are completely without any consumptiondata.* In addition, even among those who signed waiv-

ers, we were not always able to obtain a complete yearof data. For example, some utilities maintain activebilling files for only six months at a time. We estimatethat 25% of the sampled households have missing datain one or more of the following ways. The problemsare roughly ranked by frequency, the first three beingthe most frequent and vexing by far:

• completely missing all energy data,

• data for one or more fuels missing,

• irregular billing — such as for fuel oil,

• several months of data missing,

• missing end point(s) in a series,

• estimated bills instead of meter readings, and

• data for more than the 12-month reference period(April 1976 to March 1979) requiring prorating tothe reference period.

The simplest method for imputing missing data whenestimating a total is to ignore the problem, which is tosay, one might base the estimate of the populationtotal on the «j respondents in the sample of n, «j < H ,for whom information is available. This might be ac-ceptable when the sample is a simple random sampleand when no information correlated with the charac-teristics of interest is available or if it is believed thatnonrespondents do not differ systematically from re-spondents. However, this procedure is likely to resultin a greater mean square error of estimation thanother methods when these conditions do not obtain.Such is the case in the interim survey wherein energyconsumption can be expected to vary systematicallywith the type of fuel used for heating, climate andgeography, size of dwelling, income, appliance mix,and so on, all of which suggest that there are betterways of imputing for missing data than the one sug-gested above. Accordingly, we are planning a three-phase effort for imputing missing data, the ultimategoal being to provide the best possible estimates of thetotals and performance coefficients described sbove.

A THREE-PHASE PLAN FOR1NPUTING MISSING DATA

We will follow a cyclic, three-phase plan consisting ofimputation, diagnosis and evaluation, and revision ofimputation procedures. Our short-term goal is to im-prove the estimates of energy consumption produced

•Households that completely refused to participate in thesurvey are of concern - but not for the purpose of this paper.

Page 100: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

93

from the NIECS, and our long-term goal is to establishsound methods for imputing missing consumption datain future editions of the residential survey. We are cur-rently in the first phase of the first cycle of the planand are looking ahead to the second phase.

In the first phase, regression techniques will be usedto impute an annual consumption figure for each fuelindependently depending on what the fuel is used for.For example, taking the households for which we havecomplete electricity records, we will regress annualizedelectricity consumption on such variables as heatingdegree days, family income, size of dwelling, and num-ber of family members, with separate regressions beingperformed for those households which use electricityfor heating and cooling, those which use it for heatingbut not cooling, and so on. The regressions will useunweighted data and will be done independently foreach fuel. To estimate an annual consumption figurefor a household which is completely missing data onsome fuel, we will simply substitute the household's'

scores on the predictor variables into the appropriateregression equation.

A somewhat different approach will be used whenonly part of a fuel series is missing. In such cases, ahousehold with partly missing data will be matchedwith similar households having complete records, andthe fraction of annual consumption corresponding tothe period in common to all households will be used toprorate the data for the partly missing household.

These procedures, and others suitable to the othermissing data problems, will be carefully evaluated inthe second phase of the imputation project. The regres-sion and prorationing schemes will be explored withattention to the goodness of fit of the procedures,likely through the use of cross-validation techniques. Itis expected that this diagnostic assessment will lead toreliable, statistically valid methods for imputing miss-ing consumption data and more accurate estimates oftotal energy consumption, consumption by end use,and coefficients of performance.

Page 101: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Energy R&D in the Private Sector*

Pasquale Sullo and John WilkinsonSchool of Management

Rensselaer Polytechnic InstituteTroy, New York

Howard K. Nason

President, Industrial Research Institute Research CorporationSt. Louis, Missouri

ABSTRACT!

The purpose of this study is to supply the Department of Energy (DOE) with informationpertinent to the formulation of realistic national energy research and development (R&D) policies andto facilitate cooperation between government and business in the development and commercializationof new and improved energy technologies. The study gathered information on the amount ofenergy-related R&D private companies are doing, the types of energy-related R&D they report, andtheir perceptions about appropriate areas for government support.

To obtain this information, the Industrial Research Institute Research Corporation (1RI/RC), inconjunction with Rensselaer Polytechnic Institute (RPI), used mail questionnaires and personalinterviews to gather data over a two-year period. Data collected in Year I (October 1977-May 1978)were submitted to DOE in a self-contained report in September 1978.± This report focuses on Year IIfindings (October 1978-May 1979).

The mail questionnares obtained data on the amount of corporate R&D funding in specificenergy-related technology areas. In Year II, the questionnaires were sent to 814 companies chosen torepresent a broad spectrum of American industry and accounting for a substantial amount of thenation's R&D; 351 companies returned questionnaires.

The interviews gathered information from top-level executives on overall corporate energystiategies and on specific energy-related R&D plans and projects. In addition, the executives describedtheir perceptions of the role government plays in facilitating and/or inhibiting the commercializationof new technologies and identified areas of government support that would be helpful to theirrespective companies.

Eighty-seven onsite and eight telephone interviews were conducted. The ten consultants whoconducted the interviews were emeritus members of IRI/RC and were formerly senior R&D executivesbefore they retired.

In this and all preceding reports, all information, quantitative and qualitative, is aggregated in amanner sufficient to protect the confidentiality of individual participants.

•Sponsored by the Department of Energy, Office of Policy and Evaluation, Contract No.Ex-77-C-01-6089. Any opinions expressed in this report are those of the investigators and do notnecessarily reflect the position of the Department of Energy.

fThe report from which this abstract is excerpted is available in two parts, the ExecutiveSummary and the complete report, on request from Industrial Research Institute ResearchCorporation, 7800 Bonhomme Avenue, St. Louis, Missouri 63105.

| " A Study of Energy R and D in the Private Sector: Phase Two," Industrial Research InstituteResearch Corporation, 7800 Bonhomme Avenue, St. Louis, Missouri 63105, September 1978.

94

Page 102: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

95

MAJOR FINDINGS OP MAIL SURVEY

• The total corporate-funded energy R&D reported by 351 responding companies was approximately$823 million (1976),§ $945 million (1977), and $1.11 billion (budgeted 1978). Based on thissample, the percent of R&D that was energy-related ranged from approximately 10% in 1976 toapproximately 12% in 1978. If one restricts consideration to only those companies reporting energyR&D, these percentages increase to approximately 15%.

• Extrapolation of the results of this survey indicate that for the three-year period, privately fundedenergy R&D was approximately $2 billion annually. The rate of annual increase from 1976 to 1978in privately funded energy R&D does not seem to significantly exceed the inflation rate during theperiod.

• Of the ten technology areas receiving the most energy R&D corporate funding, six are associatedwith conservation (Chemical and Refining Processes, Improved Energy-Saving Processes, Electric."Transmission and Distribution, Electric: Environmental and Pollution Abatement, New Energy-Saving Processes, and Aircraft Engine and Airframe). This grouping represents 27% of the totalitemized energy R&D leported. Three supply-oriented technologies appear in the top ten list and areconcerned with near- to mid-term commercialization efforts (Geological Assessment Techniques,Batteries, and Oil Drilling and Production Techniques).

• When asked to comment on the degree of importance of various technology areas for receipt of DOEsupport, all aspects associated with nuclear R&D and coal gasification and liquefaction received veryhigh ratings by a majority of respondents.

MAJOR INTERVIEW FINDINGS

Industry feels that it cannot shape long-range corporate energy strategies or undertake long-lead,high-risk energy R&D ventures in the current climate of uncertainty.

Companies call for a rational and consistent national energy program which will reduce thenumber of conflicting and ever-shifting regulations to which they are subject. Furthermore, they urgethat price controls, allocations, and mandated conversions be lifted to establish free market conditionswhich they believe are essential to solving our national energy problems. A comprehensive energy planwould also include programs for each energy source or technology including development goals.

Concern for an uninterrupted supply of energy has motivated many companies to ensure their ownsources of supply.

In the interest of ensuring an uninterrupted supply of energy, consumer product companies haveundertaken conservation efforts and developed various capabilities for producing in-house sources ofenergy. Companies who produce energy primarily for sale have also shown an interest in conservationand have undertaken exploratory projects to uncover more energy resources. These efforts, whichprimarily can bring only near- and mid-term results, are an attempt to reduce corporate dependence onuncertain outside energy sources. These uncertainties are caused both by the political vagaries offoreign governments and by our own government's shifts in pricing, allocations, and conversionregulations.

Although conservation has been a main element of corporate energy strategy, industry now feelsthat conservation efforts have reached a point of diminishing returns.

Industry reports that the obvious conservation improvements have been made. Further progresswill require greater and greater capital expenditure and correspondingly high risk. Future conservationprojects will be evaluated in relation to other projects on the basis of their potential return oninvestment.

The vast majority of companies recommend the development of nuclear and coal technologies toincrease our national production of electricity.

This opinion was expressed in both the mail questionnaires and personal interviews. By supportingand expanding nuclear power and coal combustion for stationary equipment, more petroleum fuelswill be available for essential portable equipment and chemical feedstocks. The development of anelectric/hybrid automobile and liquid synthetic fuels can further ease the burden on diminishingpetroleum supplies.

§ If data from 61 Year I respondents who did not respond again in Year II is added to this figure,the reported amount of 1976 expenditures becomes $1.13 billion. In the main report, expendituresare reported for 102 technology areas.

Page 103: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

96

There is general consensus that government should support basic research in universities,government laboratories, and industrial R&D settings and that it should not become involved incommercialization when industry has the capacity to do so.

Most companies agree that government support may be necessary for long-range technologies orwhere costs and risks are extremely high, for example in the development of nuclear power, oilresources, or rail transportation. However, a major opinion is that government can do more tostimulate industrial R&D by reducing the number of its regulations and by providing tax breaks andother incentives.

Although industry places emphasis on the importance of nuclear and coal technologies, itacknowledges that all forms of energy should be explored to help solve our energy problems.

When trying to stimulate the development of technology in renewable resources, for example, insolar energy, industry recommends that DOE support basic research rather than attempt tocommercialize an immature technology.

Page 104: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

The Autopsy Tissue Program*

Terry Fox and Gary TietjenStatistics Group S-l

Los Alamos Scientific LaboratoryLos Alamos, New Mexico

ABSTRACT

The autopsy tissue program was begun in 1960. To date, tissues on 900 or more persons in sevengeographic regions have been collected and analyzed for plutonium content. The tissues generallyconsist of lung, liver, kidney, lymph, bone, and gonadal tissue for each individual. The originalobjective of the program was to determine the level of plutonium in human tissues caused solely byfallout from weapons testing. The baseline thus established was to be used to evaluate future changes,l-'rom the first, this program was beset with chemical and statistical difficulties. Many factors whoseeffects were not recognized and not planned for were found later to be important. Privacy and ethicalconsiderations hindered the gathering of adequate data. Because the chemists were looking foramounts of plutonium very close to background, possible contamination was a very real problem.Widely used chemical techniques introduced a host of statistical problems - difficulties common tolarge data sets, unusual outlier detection methods, minimum detection limits, problems with aliquotsizes, and time trends in the data. The conclusions point out areas to which the biologists will have todevote much more careful attention than was believed.

INTRODUCTION

Because plutonium is extremely rare in nature, non-occupational exposure to 239Pu is usually a result offallout from atmospheric weapons testing. Occupationalexposures may take place in facilities producing or usingplutonium. Exposures can result from external radia-tion, ingestion, inhalation, or wounds. Data on exposedpersons have been collected at several laboratories buton a less extensive scale than at Los Alamos.

The autopsy tissue program at Los Alamos ScientificLaboratory (LASL) was established in 1960. Its originalobjective was to validate urine bioassay estimates ofplutonium in occupationally exposed laboratory em-ployees and to determine the pattern of plutoniumdeposition in the body. A « c « d objective, whichdeveloped from the first, is that of establishing baseline

•Report LA-UR-79-2782, Los Alamos Scientific Laboratory,Los Alamos, New Mexico. Work supported by the Departmentof Energy under Contract No. W-7405-ENG-36.

concentrations of plutonium in tissues of the non-occupationally exposed general population in variousgeographic areas. Once established, such baselines willbe useful in monitoring changes related to the growth ofthe nuclear industry. It should be emphasized that thetotal amounts of plutonium found in tissue samples ofan individual in this study are three to four orders ofmagnitude smaller than the ICRP recommended maxi-mum permissible body burden of 40 nCi of plutoniumfor occupational exposures.

Tissues from seven geographic regions are collected.These regions include (1) Los Alamos, New Mexico;(2) New Mexico (other than Los Alamos); (3) Colorado;(4) New York; (5) Pennsylvania; (6) Illinois; and (7)Georgia-South Carolina. The tissues collected includebone (rib and/or sternum and/or vertebral wedge),kidney, liver, lung, lymph node, spleen, thyroid, andgonadal tissue. Pathologists from around the countryprovide these tissues as permitted by their local andstate autopsy laws.

When these tissues are received, they are ashed anddissolved in acid. Only a fraction of the solution (the

97

Page 105: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

98

aliquot size) is analyzed; the remainder is retained as anarchival sample.

The samples are passed through an ion-exchangecolumn and the isolated plutonium electrodeposited onstainless steel planchets. For samples analyzed before1972, 239Pu tracer was added (just prior to ion-exchange) to estimate the fraction (/?) of plutoniumrecovered. Since June 1972, 242Pu has been the tracerof choice because of its longer half-life and lower energyof alpha decay. Beginning in 1976, the tracer has beenadded to the wet tissue prior to ashing to give anindication of the recovery for the entire analyticalprocedure. The a-activity of the 239Pu spectrum ismeasured for 50,000 sec. The measured activity isdivided by an efficiency factor (£"), which is the fractionof the total activity reaching the detector. The result isgiven in disintegrations per minute (D),

D = (S/t1-B/t2)/RE,

where S is the sample count, B the average backgroundcount, and tx and r2 the respective times (in minutes)for which the sample and backgrounds are counted. Onedisintegration per minute of ^u is approximatelyequivalent to 1.14xlO"1 2gof 239Pu.

The data gathered to date (approximately 900 cases)are given in a special issue of Health Physics.* The dataconsist of the measured concentrations on each sampleand an indication of whether the measurement issignificantly greater than zero. Two methods were usedto assess the significance of the sample count. In thefirst method, a minimum detection limit (the 99thpercentile of the net background for reagent blanks)was set up and samples whose net count fell below thedetection limit were declared not to be significantlydifferent from zero; that is, nothing was detected. Thismethod did not, however, take into account therecovery, efficiency, or count rate for the sample, andthe second method consisted of constructing an ap-proximate 95% confidence interval on the concentra-tion (D) of each sample, based on propagation of errorformulae. If the confidence interval included zero, theactivity in the sample was judged not to be significantlydifferent from zero. With the published data there is anexpanded account of the history of the program, themeasurement process, and the quality control program.

B. G. Bennett of the Health and Safety Laboratory(Fallout Program Quarterly Summary Report, Janu-ary 1, 1974, HASL-278) has estimated that 320 kCi of2 3 9 Pu were dispersed globally during atmosphericweapons testing. Beginning in 1965, levels of 239Pu insurface air were measured on a monthly basis at a

number of localities throughout the world (Environ-mental Measurements Laboratory EML-356, Appendix).Annual averages, as calculated at McClellan Air ForceBase, appear in Fig. 1, which shows that the levels ofplutonium in the stratosphere began rising sharplyabout 1961, reached a peak about 1963, then fell off tothe former levels in about 1967. Unfortunately, thelocalities at which the measurements were made do notcoincide with those at which the autopsy tissues weretaken (except New York City), but a study of the datafor 1966-1977 for three widely separated localities inthe United States (New York City, Miami, and Sterling,Virginia) indicates that these localities do not differsignificantly in the total amount of 239Pu received,despite the fact that some localities lag behind others bya month. However, surface air data for Salt Lake City(the closest station to the Colorado, New Mexico, andLos Alamos sites) differ considerably from data for the

1000

IOO

Q .

o

8CM

10

O.I

0.01

STRATOSPHERE

A

TROPOSPHERE

. \

I960 1965ATMOSPHERIC PLUTONIUM LEVELS

Fig. 1. Atmospheric phitonium levels.

1970

Page 106: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

99

east coast, so that fallout patterns across the countrymight have been quite different and could account forsome geographical differences seen later.

There are also highly significant differences in theamounts of 239Pu in the air from month to monthwithin a year for a given station. The annual totalamounts of 2 3 Tu collected by one air sampler at thegiven stations (Table 1) show sizeable year-to-yeardifferences.

As a result, some time trends in the data are to beexpected, particularly since people are still inhalingplutonium which has retention times in the lung of100-1000 days and in ths bone and liver of 40-200years.

The form of the fallout tva; most probably PuO2, andinhalation is believed to be the only significant pathwayinto the body. Bennett used a compartmental model tothe plutonium intake and resulting burdens in the lung,liver, and bone, basing his estimates on the ICRP TaskGroup on Lung Dynamics model and observed levels offallout in oil and air samples in New York City.

In Figs. 2 and 3, we have plotted (as squares since it isthe earliest data available) the annual median lung andliver concentrations for the New Mexico cases. Theplots also show the results of Bennett's calculations as asolid line. The shape of his curves agrees with theautopsy data, and it is surprising that the agreement inmagnitude is as good as it is. It is quite conceivable thata refinement in Bennett's parameters or our data couldproduce even better agreement. The autopsy data, then,does lend support to the theoretical model and doesshow that there are definite trends with time. Theimplication from these time trends is that reporting asingle mean or median for a given locality is notsufficient; a summary for each year is necessary forfuture work.

STATISTICAL NATURE OF THE SAMPLE

It is important to realize that autopsy samples do not,in general, constitute random samples from all deaths.The reason for this is that some causes of death aremore heavily represented in autopsy cases than in asample of all deaths. Traumatic deaths or deaths fromunknown causes (including unattended deaths) are morelikely to require autopsy, although practices vary fromplace to place. As long as the "reason" for autopsy hasnothing to do with the plutonium concentration in thetissue, the sample may be treated as a random one.Traumatic deaths and deaths from unknown causes arenot believed to have anything to do with exposure tofallout. In order to verify this belief, we present(Tables 2 and 3) some common causes of death in oursample along with the associated plutonium concentra-tion in lung and liver tissue. A chi-square test ofindependence was used to measure association betweencause of death and plutonium concentration. Chi-squarevalues of 32.5 and 36.4 with 28 degrees of freedomindicate no detectable association. Thus, our samplemay be treated as a random sample with respect toplutonium concentration.

Many autopsies are done because the pathologist hasobtained consent of the person or his next of kin. If aperson knew or feared that he had been exposed toplutonium other than fallout (e.g., occupationally), heor his next of kin may have been more likely to giveconsent for autopsy. This might have been cause for realconcern, but if there was even the slightest evidence ofoccupational exposure, the sample was classified as suchand does not appear in the published data with whichwe are dealing here.

Table 4 gives the number of samples in each geo-graphic-tissue-sex category. The age distributions for

Table 1. Total 239pu concentrations in surface air for years 1966-1977 (aCi/m3)

H6619571958196919701971197219731974197519761977

New York City1475.10611.40965.70652.70774.05719.80325.29160.69464.91240.5374.40

251.49

Sterling, Va.U95.60402.86829.00629.90659.00629.10275.60125.24

Miami10T4T79593.43848.30550.80752.20728.74327.90202.11534.20256.2485.87

270.18

Salt Lake City

1326.10727.50255.54690.30

Page 107: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

100

CD

a °

o

a:or

CD

o

1960 1962 1964 1966 1968 1970 1972 1974 1976

YEflR OF DEflTHFig. 2. Lung tissue - median concentration vs year of death (New Mexico Data 1960-1975).

Table 2. Lung tissue: number of persons in each cause-of-death category

Cause Of Death

HomicideAccidentInjuryHeartPneumoniaCancerAlcohol, DrugsOtherTotals

.2dkg*

9

73

14853

16<&

Plutonium

.2 - .4dkg.

39

23887

218?

x ? = 32.53

Concentration

4-.8 dkg. .

944

139

33

2356

with 28 d.f.

3-2 dkg.

733«>44?

1342

2 dkg

4248I303

25

Total s

3018195423242076

274

each geographical region are shown in Fig. 4. (Thenumber of cases shown in Fig. 4 and Table 4 do notagree because the age was not known for every subjectin the data base). These may or may not be typical ofthe general population (New York is not); the effect ofage is discussed later. The years during which the data

were collected are given in Table 5. The effect of time isdiscussed later.

DATA EDITING

In every large set of data, one finds outliers (observa-tions which do not appear to be a part of the bulk of

Page 108: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

101

1960 1962 1964 1966 1968 1970

YEflR OF DEflTH1972 1974 1976

Fig. 3. Liver tissue - median concentration vs year of death (New Mexico Data 1960-1975).

Table 3 . Liver tissue: number of persons in each cause-of-death category

Cause of Death

HomicideAcci dentInjuryHeartPneumoni aCancerAlcohol, DrugsOthersTotals

.4dkg*

3174439

1041

Plutonium

.4-1 dkg

<533

12<5<>7

1053

X 2 = 36.40

Concentration

1-2 dkg

1055

201134

2533

with 28 d.1

?-3 dkg

632

13352

1651

3 dkg

56494748

47

Totals

3018215828252659

275

*6'f.g = dis/min per *<ilogram of wet tissue

the data). These may result from errors in observation,transcription, keypunching, or a failure to measure whatwas intended (such as contaminated or misclassifiedsamples). In our data, the plutonium concentration isnear background, and even slight contamination mayhave a significant effect. Some contamination fromnatural uranium and thorium has been observed in

freshly purchased reagents, on new stainless steelplanchets, and as a result of processing a sample with ahigh activity along with other samples. The amount ofcontaminant added to the autopsy during the analysismay have been equal to the activity in the sample, thuscausing the measurement process to give erroneouslyhigh results.

Page 109: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

102

TaMe 4. Estimates of cental tendency (dij'min pet kiloftam of wet titwe)

GEOGRAPHIC REGION*-SEX-TISSUE

LA F GONADLA M GONAOLA F KIONEYLA M KIDNEYLA F LIVERLA M LIVERLA F LUNGLA M LUNGLA F LYMPH NODELA M LYMPH WOELA F THYROIDLA M THYROIDLA F VERTEBRAELA M VERTEBRAENM M GONADNM F KIDNEYNM M KIONEYNM F LIVERNM M LIVERNM F LUNGNM M LUNGNM F LYMPH NODENM M LYMPH NODENM M RIBNM M SPLEENNM M THYROIDNM F VERTEBRAENM M VERTEBRAECO F 30NECO M BONECO F GONADCO M GONAOCO F XIONEYCO M KIDNEYCO F LIVERCO M LIVERCO F LUNGCO M LUNGCO F LYMPH NODECO M LYMPH NODECO F RIBCO M SIBCO F SPLEENCO M SPLEENCO F THYROIDCO M THYROIDCO F VERTEBRAECO M VERTEBRAENY M GONADNY M LIVERNY M LUNGNY M VERTEBRAEPA F GONADPA M GONADPA F KIDNEYPA M KIDNEYPA F LIVERPA M LIVERPA F LUNG

WEIGHTED MEAN

.3141.7210.4310.3808

1.5-5472.14211.22571.0745

18.2S871S.7S92

.20633.27241.0091

.8280

.0728

.".41

.22501.53102.04991.58731.0337

1.2.S2123.59741.0538.1856

1.0785t.02191.07971.22501.3941.4942.3755.2265.1693

1.53191.8423.5252.5836

4.58834.4788

.7935

.474?

.1420

.1071

.2581

.3503

.5934

.94001.12911.5799

.04261.4080.5732.5531.1496.1.419

1.31211.3644

.4494

UNWEIGHTED MEAN

.7551

.5629

.5510

.71301.75742.03751.75791.5398

23.634520.2821

.72104.3034.9759

1.3453.1453.3514.4331

1.74762.04371.48531.0035

14.694512.7800

1.2274.2081.9854

1.50391.62251.35761.9441

.2784

.4144.4337.3877

1.52152.0132

.4950

.609313.241020.2237

.6645

.4438

.1488.1130.2545.8813.5641

1.08221.43241.75801.09992.8785.7051.8522.1523.1571

1.44571.4988

.5017

MEDIAN

0.0000.5585.1240.3700

1.58501.97551.0240

.98508.33305.9285

.81101.6580.4320.7730.0525.0810.1095.7210

1.7545.9540.6135

6.55705.4285

.9580

.1510

.5380

.5550

.7730

.88001.5055.0470.1110.1400.1010

1.40101.7350

.4005

.43607.04953.0385

.7430

.4145

.1075.1190.4355.3330.4590.7225

1.00001.5000.6290

1.5395.7665.3135.0990.1115

1.49701.2900

.3025

N

1110563952386336533415143219253985333433843182192325216317321474499272129SS125428810221831121427442927312612108511504112142

Page 110: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

103

Table 4 (continued)

GEOGRAPHIC REGION* WEIGHTED MEAN UNWEIGHTED MEAN-SEX-TISSUE

PA M LUNGPA F LYMPH NODEPA H LYMPH NODEPA F RIBPA M RIBPA F SPLEENPA M SPLEENPA F THROTDPA M THY30IDPA F VERTEBRAEPA M VERTEBRAEGA M GONADGA F KIDNEYGA M KIDNEYGA F LIVERGA M LIVERGA F LUNGGA M LUNGGA F SPLEENGA M SPLEENGA c VERTEBRAEGA M VERTEBRAEIL F LIVERIL M '.IVERIL F LUNGIL M LUNG

*LA = Los AlamosNM = New Mexico (otherCO = ColoradoNY = New YorkPA = PennsylvaniaGA = Georgia and SouthIL = I l l ino is

20 30

LOS ALAMOS .

NFW MEXICO • t~

COLORADO —

NFW YORK * 1

PENNSYLVANIA

GEORGIA •

ILLINOIS

.35326.90432.2742

.9732

.5052

.2007

.19561.20621.2126.3772.4571.1455.0835.1451

1.54402.U59

.3138.5994.2047.1567.5651.4081

1.49571.6514

.1358

.1023

than Los

Carolina

40

• 1

—1

.37296.82413.97711.0404

.5604

.2318.2839

1.90231.5684.3892.4556.1362.1004.1853

1.79582.2240

.3454

.5252

.2352

.1854

.5083

.38701.51701.7810

.1568

.1128

Alamos)

AGE50 60 70

-i * I

-1 • 1 -

3 •i . i

• 1

—tz=« 1

MEDIAN

.25404.06301.6160

.9270

.4310

.1520

.1545

.9520

.5750.3630.3650.1050.0520.1155

1.52702.1470

.2985.3325.1730.1635.6075.4000

1.44301.7445

.1170.0975

80

.(nm fn r

• (n =

(n=— • (n=

N

1171973136842148201011167214962577656764743284322142314

90

= 74)

105)

227)

35)

269)

* (n=l32)

— • (n=45)

KX)

The dots represent the 10th. 50th. and 90th percentiles. The endpwnts ofthe rectangles are at the 25th and 75th percentiles so that they includethe middle 50* of the data.

Fig. 4 . Age distributions of geographical &ompu

Page 111: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

104

Table 5. Years during which date were collected

Geographic region Years

Los AlamosNew MexicoColoradoNew YorkPennsylvaniaGeorgia-South CarolinaIllinois

1960-1963. 1966-19771960-1963,1966-19771970-19771967-19681974-1971972-19761973-1977

Measurements using small aliquots and small tissuesamples are much more variable than samples fromlarger aliquots and larger tissues, and this fact plays apart in creating outliers. A third contributor to outliersis the fact that some solids may interfere with themeasurement process. Finally, there is a possibility that,despite all efforts to prevent it, some occupationalexposures may have crept into the data base. Fre-quently the only indication that an observation is anoutlier is its magnitude. If it is much larger than thebulk of the data, we suspect the measurement process.

Large erroneous observations cai. jeriously impair thestatistical analysis of the data. They can bias the meanupward, increase variances, and cause tests of hy-potheses to fail. For these reasons, we have omittedobservations that have been identified as outliers by astandard statistical test. We believe such omissions willgive more realistic estimates of means, standard devia-tions, and percentiles. We have used the Grubbs'statistic as a test for single outliers and the Tietjen-Moore test2 as a test for multiple outliers.

In Table 6, we present the results of our outliertesting. For each geographic location, sex. and tissuetype, the number of observations in that set («) and thenumber of outliers detected (k) are given. Suspectedobservations are declared to be outliers only whenfound to be significant at the a = 0.05 level. Alsopresented are the case r.jmber. the concentration, andthe percentile (percentage of observations less than)corresponding to the concentration.

The case against an outlier cannot be proven abso-lutely with statistical methods. There may be statisticalevidence that the observation does not belong with thebulk of the data, but there is always some chance(however remote) that the measurement in questionmay not be erroneous. To assist in deciding the caseagainst an outlier, we give some related informationwhich, in many cases, supports the evidence that theobservation is indeed erroneous. Because small aliquotfractions and small wet weights can lead to highapparent concentrations, aliquot size and wet weight of

the tissue involved are reported. It is expected that,when one tissue of a given individual shows a relativelyhigh concentration of plutonium, other tissues fromthat individual also will be high. If this is not the case,the high tissue value is suspect. We have, therefore,presented the percentiles of related tissues for the sameindividual. Altogether we detected and omitted 139outliers in the 4373 observations (3.2%). For the mostpart, the outliers were obvious (from their magnitude)even without statistical tests; frequently they wereseveral orders of magnitude larger than the closestobservation.

ESTIMATES OF CENTRAL TENDENCY

After the outliers have been identified and removed, itis appropriate to estimate central tendencies (i.e..means, medians, etc.). Each geographic region, sex. andtissue combination is examined separately. For each ofthese sets the 10th. 25th. 50th, 75th. and 90thpercentiles are calculated and give a good idea of thespread of the data (Table 7). The 50th percentile is themedian.

The median and two means - unweighted andweighted - are shown in Table 4. The unweighted meanis the arithmetic mean of the data, and the weightedmean is related to aliquot size and wet weight.Measurements derived from small aliquot sizes and smallwet weights are more variable than those from largeraliquots and larger tissues, because count data areconsidered to have a Poisson distribution with aparameter X which is the average count per timeinterval. For a Poisson distribution, both the mean andvariance are equal to X. A 25% aliquot would yield asample with an average count (or variance) of X/4. Forsuch an aliquot, the measured activity x is multiplied by4; hence, the quantity of interest is the variance ofv = 4x. The variance of y is 42[Var(x)] = 16Var(x) =16(X/4) = 4X, so that the variance of a 25% aliquot isfour times that of the undivided sample. In other wordsa 25% aliquot has twice the standard deviation of a100% aliquot. The same is true of a tissue with smallwet weight. The weighted means in Table 4 use inversevariances as weights so that small tissues and smallaliquots get less weight.

AGE TRENDS

The age at death of the persons considered in thisreport was an uncontrolled variable, and the observedage distributions might not be typical of the populationat large. The age distributions over the entire timeperiod are shown by locality in Fig. 4. Although the

Page 112: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 6. Result! of outlier tetting

6E0GRAPHICREG10N-SEX-TISSUE

IA-F-K<dney

LA-ff-K1dMyLA-F-LWer

LA-F-LungLA-M-lung

LA-F LjwphNode

LA-KAynphNode

LA-F-Spleen

LA-F-ThyroidLA-F-Vertebrte

LA-M-Vertebrie

NM-M-Gonid

NM-F-Ki ditty

n

58

4064

6438

56

39

11

1636

21

28

41

k

2

12

12

3

5

2

14

3

2

2

CASENO.

3-381-142

3-3611-821076

11-183-621-88

7-1147-25-2

1-603-124

U-1281-60

11-150

7-11411-86

11-1382-1462-102

3-385-56

3-763-823-62

7-12811-58

1-821-84

OKG

11.0557.647

17.65143.9047.000

8.78314.0817. S55

857.692369.512290.00

1093.182327.5293.33227.273218.947

3.6542.857

8.00023.25620.9749.5599.143

49.8019.73718.750

5.3854.333

25.9374.701

OUTLIERPERCENTILE

9897989897

989795

989695

9895939088

9283

9497959289

959186

9793

9695

ALIQUOTSIZE

{<)

1010102D10

252.520

404020

4020501050

1050

5024410

10104

5099.9

1010

WETWEIGHT u

(KG) |

.199.170.315

1.00.700

.475

.767

.307

.0013

.0041

.003

.0022

.002

.0009

.0066

.0019

.121

.217

.004

.215

.267

.170

.070

.050

.038

.088

.013

.015

.320

.234

S

I

27

55

81

36

>.

4091

9571

51241

1256901234

562

71389867

782995

4270

PERCENTILES OF

BCUJ

_J

665867

345438

658125

6436

6492

6566

30254166133

7456

3640

» J

8053691991

602250

318

3

6061

1347948056

284997

617

8259

RELATED TISSUES

UJ

RIB

3266755539

846080

9838 71

708093325

702360

65 2085 55

38

£

S!

27

36

92

7885

69

6

44

93

204481

4088

22

sUJ

92

2586

586147

8189

18

5833

3882

*dkg « dis/Mtn per kg Met t»ssu#

Page 113: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 6 (continued)

GEOGRAPHICREGIOK-SEX-TISSUE

CASENO. DKG

ALIQUOT WETOUTLIER SIZE WEIGHT

PERCENTILE ( * ) (KG)

PERCENTILES OF RELATED TISSUES

IoI

V) r-

NM-M-K1dney

NH-F-Liver

NM-F-Lung

NH-M-LungNH-F-Ljmph

Node

NM-M-Lynph

Node

NM-M-Rib

87

35

36

85

36

85

21

NM-M-Spleen 24NM-M-Thyroid 27

NM-F-Vertebrae 25

NM-M-Vertebrae 69

CO-H-BoneCO-F-Kidney

3353

7-50

7-925-22

2-1042-283-643-94

5-123-527-945-88

11-56

11-1122-3?3-92

11-811-9411-76

11-12011-140

3-563-523-645-46

3-303-423-50

2-1487-10

2-150

8-128-38

6-1348-2

6-124

5.761

120.7059.003

46.62636.3159.653

231.82

400.00385.00256.667188.750142.00

305.714277.50141.25

7.0004.9571.182

18.0010.667

125.79660.0052.0014.643

77.0035.76933.85717.36816.84212.556

16.66756.7119.1157.7197.213

99

9794

97959299

9795928986

999897

959196969396928885

999796949391

9798969593

50

2550

105

2.52.5

2020404050

502020

50505050SO

25

54

210101010

.309

.5961.555

.163

.863

.750

.665

.001

.002

.0015

.002

.001

.007

.004

.004

.012

.023

.022

.004

.003

.063

.070

.050

.115

.075

.130

.175

.321

.076

.223

.300

.076

.113

.114

.122

12 42

70

6778

763

522

14146714

6079106790

687810

536334

788795771

6091848191

24

19894427

SO25143169

7618

554749

83254478

935851955772

9996528611

4186

2762516811

7568

306738

65629278

769270962585

9826597123

1473

14545768

397969

3046955743

487645696237

98476

17

67

43 46

85

6973

31

5892735462

78

2291

656212

266853

74

7934866

Page 114: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 6 (continued)

GEOGRAPHICREGION-SEX-TISSUE

CO-M-Mdney

C0-H-L1wr

CO-F-LungCO-M-Lung

CO-F-LymhNode

C04M.)«iphflOGC

CO-F-R1bCO-H-Rib

CO-N-Spieen

CO-F-ThyroidCO-H-Thyroid

CO-F-VcrtcbrieCO-M-Vertebrie

NY-N-Gonid

NY-N-Liver

NY-M-Lung

NY-H-Vertebrie

PA-F-Gon*d

PA-M-Gonad

n

94

131

69128

44

891124

33

1317

2846

32

29

33

29

15

110

k

2

2

13

2

112

2

13

12

3

2

2

3

3

2

CASENO.

6-1268-66

8-128-116

6-48-248-126-10

6-1348-44

6-1508-648-188-68

8-13622-4

8-13016-3616-428-128

8-166-150

8-4

4-164-524-6

4-164-10

4-84-2

4-104-304-46

19-6419-110

14-22

19-9615-70

DKG

30.35710.972

10.82410.6113.372

67.53315.9514.950

76.13673.333

537.508.2352.4292.203

1.3211.0873.412

21.68420.824

5.500

12.62018.1566.549

214.2545.5010.606.6186.579

10.9588.000

23.52919.58319.11213.7149.0007.333

81.11113.714

OUTLIERPERCENTILE

9998

999899999898

9896

99929692

979493948983

979896

97949197939794979390948881

9998

ALIQUOTSIZE

(*)

1010

252550252550

4040

40502020

505050505050

202020

2020205555444

505030

5050

WETHEIGHT

(KG)

.196

. 0 7 2

. 3 0 6

.286

.325

.890

.163

.404

.0022

.0015

.002

.017

.140

.227

.268

.046

.017

.019

.017

.004

.271

.282

.142

.020

.030

.025

.550

.456

.480

.615

.170

.180

.242

.007

.002

.005

.018

.021

Ul

I

97

50

9735

§

28

28

97

446133451528

97

97617679615264

>•UJ

u

2418

5524

9676

7735

86

46465236

65

637795

6288

8

PERCENTILES OF

U i

—i

6945

84789997

5241

86679342

253514

28

188621

978327

207393137

936186

86

. j

4191

98

5917

887

6911

95604

9

648893

7112687179

797615142667

54

RELATED

s5

—i

514

338473

59

63

472

16

139984

5

45

28

TISSUES

to

48

24

83

84

50

16

831

5UUJa!I/I

16595068

294

62

63

i

50

838642

3>8Sj

32

93

45

69178369977773

52

Page 115: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 6 (continued)

GEOGRAPHICREGION-SEX-TISSUE

PA-F-Liver

PA-M-Lung

PA-M-LymphHode

PA-F-RibPA-H-Rib

PA-F-Spleen

PA-M-SpieenPA-F-Thyrold

PA-M-Thyroid

GA-F-K1dney

GA-H-K1dn«y

GA-F-LiverGA-M-LiverGA-F-Lung

GA-M-LimgGA-F-SpieenGA-H-Spleen

GA-F-VertebrMGA-H-VertebrieIL-M-Lung

n

43

120

77

1470

44

14923

104

52

64

587758

774849

294415

k

2

3

4

12

2

13

3

3

2

112

111

111

CASENO.

15-219-42

19-9220-9423-80

14-2615-9214-2415-36

14-214-3214-58

15-7614-2

19.15014-5419-614-2

15-5619-9815-64

9-1417-8617-52

9-289-12

9-15017-13825-1225-109-10

9-12617-32

17-869-60

13-24

DKG

61.3946.506

20.71413.674

7.109

150.00123.3356.2555.00

9.608.5714.300

3.0592.175

6.62426.50013.00012.50

137.0069.86726.588

30.001.7471.505

83.6333.75

6.9578.035

15.3234.014

73.0161.6501.244

2.1758.800.471

OUTLIERPERCENTILE

9895

999898

9997%95

939997

9896

99969288

999897

989694

9897

98999897999898

979894

ALIQUOTSIZE

(X)

2525

252525

60506050

255050

5025

50505030

505050

105050

1010

25252525255050

505025

WETWEIGHT

(KG)

1.004.482

.336

.356

.341

.002

.006

.008

.004

.01

.028

.02

.017

.057

.170

.004

.008

.004

.008

.015

.017

.062

.079

.101

.245

.184

.445

.458

.372

.573

.425

.080

.045

.080

.050

.280

UJO

oto

82

40

85

41

5439

13

3831

2896

77

32

UJ3P

KIDI

17

1611

8317922

989888

3798

958

1998

38

69

944962

96

PERCENTILES OF

aeUJ5»-J

848

70

524213

803646

80

62647780

89

222956

8522

5964357395

2991

LUNI

4972

33

84

353411

35

7407235

64

548342

191

8664

5978

8328

RELATED

aQZ

I

a.s:>-_ i

865145

20

5520

20

82

27

TISSUES

RIB

56

62

799

34

99

25

5370

zUJUJ

SPL

697262

85

95

962945

588496

836549

5363

7892

a

i

THY

6364

89

6993

882222

1388

i

VERT

E!

629465

3

10

939710

2373

82

o

Page 116: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

109

Table 7. Percentiles by geographic region, sex, and titmit type

GEOGRAPHIC REGION-SEX-TISSUELA F GONADLA M GONAOLA F KIDNEYLA M KIDNEYLA F LIVERL* M LIVERLA F LUNGLA M LUNGLA F LYMPH NODELA M LYMPH NODELA F THYROIDLA M THYROIDLA F VERTEBRAELA M VERTEBRAENM M GONADNM F KIDNEYNM M KIDNEYNM F LIVERNM M LIVERNM F LUNGNM M LUNGNM F LYMPH NODEMM M LYMPH NODENM M RIBNM M SPLEENNM M THYROIDNM F VERTEBRAENM M VERTEBRAECO F BONECO M BONECO F GONAOCO M GONADCO P KIDNEYCO M KIDNEYCO F LIVERCO M LIVERCO F LUNGCO M LUNGCO F LYMPH NODECO M LYMPH NODECO F RIBCO M RI3CO F SPLEENCO M SPLEENCO r THYROIDCO M THYROIDCO F VERTEBRAECO M VERTEBRAENY M GONW)NY M LIVERNY M LUNGNY M VERTEBRAEPA F GONADPA M GONADPA F KIDNEYPA M KIDNEYPA F LIVERPA M LIVERPA F LUNG

-4-1_0

_-1._0-

_,

i

-2.* *

-l"0.0.0,0.-4.

— •a

-2.-1.— •

-,0.

— •

-

-

1.0.457?.0951.0453.0000,1535.3117.2120.1423.3155.3815.1000.3215.0000.2801.2539.0320.0015.0570,W5,1634.1855,8302.2800,3230,0148,46550000,0000,000000007500375005000785,15121900115313082977314857050749045900009539077010181195•.5130.27??.0652.0140.3400.4305.2464.0299.1385.3096.0877

25-.2500-.11280.0000.0550.7573.8913.4770.3930

0.00001.22600.0000.5570,1533.3010

-.08330.00000.0000.4505.7353.3550.3973

0.0000.5045.6150.0710

0.0000.0140.2720. 1555.7105

-1.3998-.0445.0215.0205.6718.8380.2013.2570

0.00000.0000.1875.1515.0315.0190.2855

0.0000.2130.3133.2105

1.0*40.2040.5550.0555

-.0155-.0440.0298.5250.5135.1563

500.0000.5585.1240.3700

1.58501.97551.0240.9850

3.33305.9285.8110

1.6590.4320.7730.0525.0810.1095.7210

1.7645.9540.5135

6.55705.4285.9530.1510.5380.5550.7730.8800

1.5055.0470.1110.1400.1010

1.40101.7350.4005.4360

7.04953.0385.7430.4145.1075• U90.4355.3330.4590.72251.00001.5000.6290

1.5395.7555.3135.0990.1115

1.49701.2900.3025

753.55701.10551.0790.9200

2.58702.84682.33002.U45

20.884522.50001.33308.19631.54502.0545.3373.5140.4508

2.26902.87702.46251.2460

13.125013.54151.7780.3540

1.79151.83301.87502.04553.22402.8258,5iS5.5115.5500

2.29132.9210.6403.7145

25.149313.43751.0500.7560.2518.1740.7980

1.5093.8700

1.46532.50152.40401.35704.07851.1875.8555.2570.2080

2.13152.0160.6453

905.70562.92422.37502.47503.90«>24.28585.38824.3154

93.777681.07152.7602

14.50002.43264.2714.7632

1.24001.53435.52704.14103.58462.3010

42.428640.229?3.0910.550?

5.03555.28825.76004.58907.21664.13951.73702.42402.03812.98743.62401.07411.2698

46.000070.63651.7058.9554.3428.28681.16173.34451.82752.93853.59003.29503.18588.97971.86582.6057.7156.4183

2.60903.25481.3316

Page 117: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

110

Tible 7 (continued)

GEOGRAPHIC REGION-SEX-TISSUE

PA M LUNGPA F LYMPH NODEPA M LYMPH NODEPA F RI3PA M RIBPA F SPLEENPA M SPLEENPA F THYROIDPA M THYROIDPA F VERTEBRAEPA M VERTEBRAEGA M GONADGA F KIDNEYGA M KIDNEYGA F LIVERGA M LIVERGA F LUNGGA M LUNGGA F SPLEENGA M SPLEENGA F VERTEBRAEGA M VERTEBRAEIL F LIVERIL M LIVERIL F LUNGIL M LUNG

1.0.0549

-25.0000-1.3136-.0402.0559

-.0399-.0010-.7905-.3974.1344.1108

-.2173-.0400-.0213.3780.3516.0345.1457

-.0478-.0379.1736

-.0213.5395.5975

-.0???.0200

25.1330

0.0000.0985.5360.2323.0278.0555

0.00000.0000.1380.1810

-.0970-.0035.0258.75751.1203.1240.2158.0520.0343.3123.2400.9330.9813.0520.0553

41

i?

11

50.2540.0530.5150.9270.4310.1520.1645.9620.5750.3630.3650.1050.0520.1155.5270.1470.2985.3325.1730.1535.5075.4000.4430.7445.1170.0975

75.4045

13.00004.44851.0690.7228.3183.3298

2.57481.3310.5000.5500.3310.1380.21502.86853.0433.4545.7815.4030.2905.8378.80001.94702.4965.3140.1360

90.7262

32.500014.81823.00881.4072.7521.6814

8.00003.3428.7744.9248.7222.4290.39493.56884.3378.74551.1691.5155.42921.1105.92422.58943.2505.3953.2855

distributions resemble each other generally, the NewYork data is a clear exception: the individuals from thatpopulation are much younger than those from otherareas. This difference may be because the New Yorksamples are largely from unclaimed bodies and trau-matic deaths, which occur more frequently in youngermales.

It has been suggested {Annual Report of the Bio-medical and Environmental Research Program, Jan.-Dec. 1973, LASL Health Division, LA-5633-PR, p. 32)that for a given exposure the amount of plutonium inthe liver increases with age. The same effect, to a farlesser degree, was noted for lung tissue. If age trends arepresent, it is important to adjust for them beforemaking geographical comparisons. Separate regressionsindicate no dependence between age and geography. Totest for such trends with the autopsy tissue datapresently available, we selected four very short segmentsof time (1968-1969, 1970-1971, 1972-1973, and1974—1975). During these time segments, trends shouldbe nearly constant. For the liver and lung tissue data(over all ages and localities), the plutonium concentra-tion vs age at death was fitted to a linear relationship byleast squares for each of the four short time periods.Another line was fitted to the data (for each tissue

separately) over the whole time period (1968-1975).Tests of whether the slope of the line is significantlydifferent from zero were made. For the liver, the slopesare consistently different from zero, but a single line fitsas well as separate lines for each time period. Weconclude that the linear relationship dkg = 0.91356 +0.01682 (age) best represents the effect of age on liverconcentration. Over an 80-year lifetime, an increase ofabout 1.3 dkg could be expected in the liver. From age40 to age 80, the increase would be about 0.67 dkg dueto age alone.

For lung tissue, the evidence of a trend with age is notconvincing.

For kidney, lymph tracheobronchial node, rib, andmale gonadal tissue, there is no detectable effect of agefor any of the time periods.

For vertebrae, the slope of concentration vs age atdeath is significantly different from zero for the1974-1975 data and the 1968-1975 data. Moreimportantly, the slopes for this tissue are negative (ornear zero), which supports the hypothesis -hat theskeleton is being remodeled by transfer of plutoniumfrom skeleton to liver. Moreover, the slopes do not seemto differ from each other, particularly if the1968-1969 data is omitted. An estimate of the slope

Page 118: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

I l l

(from the 1970-1975 data) is -0.0073. A singleregression line is not adequate; the age effect is affectedby the year of death (i.e., the biological effect of agingis also a function of atmospheric concentration). Thisfact makes it necessary to report both year of death andage at death when reporting means or medians.

SEX DIFFERENCES

There are roughly twice as many males in this study asfemales. To test the hypothesis of sex differences, weused all the Colorado 1970-1977 data adjusted for agetrends. The results are presented in Table 8. TheMann-Whitney test shows that there are no significantdifferences due to sex.

GEOGRAPHICAL COMPARISONS

We now wish to compare levels of plutonium concen-tration in the various geographical regions. Because thedata depend upon age at death and year of death, weattempt to eliminate these factors by considering onlyvery short segments of time (i.e., year of death)(1974-1975 and 1967-1968) and subtracting out the

Table 8. Sex comparisons in Colorado

Tissue

BoneKidneyLymph nodeRibSpleenThyroidVertebraeLungLiver

F-emale

174942101812276064

Male

32928822311444

120124

p-valuea

0.11510.70960.48510.19320.56110.30310.13560.35940.1528

"Significant if less than 0.005.

age trends found during those time periods. Almost allof the subjects in this sample were born before 1945and hence had nearly equal exposure times. Plots ofmedian plutonium concentration vs age at death forlung and liver tissue for 1974-1975 are given in Figs. 5and 6. These periods of time were selected because theyinclude the major portion of the data and because theyare the only periods where data is available from certaingeographical locations.

CD

— oZCD

ena:t—

CJ

oo

oUJ

—I—10 20 30 40 SO 80

flGE BT DEflTH70 80 100

Fig. 5. Lung tissue - median concentration vs age at death (1974-1975).

Page 119: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

112

CD

O

z «;

i—

or

oo

D o

10 30 30 40 50 80RGE RT 0ERTH

70 90 100

Fig. 6. Liver tissue - median concentration vsage at death (1974-1975).

There is no evidence that the concentrations arenormally distributed in any of the tissues. The kidney,vertebrae, and gonadal tissues are the only tissues inwhich the concentrations appear to be lognormallydistributed. The W-test was used to establish thisconclusion.3

As a result of the above testing, we have chosen to usea nonparametric procedure. The procedure we use hasbeen recommended by Lin and Haseman4 and Con-over.5 This procedure consists of a Kruskal-Wallis testof the significance of among-region differences at thea =0.05 level. When this test indicates overall signifi-cance, Mann-Whitney tests are performed for all pair-wise comparisons of the geographic regions (at the 0.05level). If the Kruskal-Wallis test is not significant, thenall pairwise comparisons are declared not significant.

The medians adjusted for age trends (Table 9) areordered from largest to smallest. This ordering indicates

which geographic regions have consistently large me-dians.

Table 10 presents the results of the Kruskal-Wallistests. For those tissues in which the p value exceeds0.05, no significant differences among regions areindicated. For the other tissues, there is an overalleffect, and we proceed to test pairwise differences withthe Mann-Whitney test.

Table 11 summarizes the results of the Mann-Whitneytesting. For each tissue, those regions underlined withthe same line do not differ significantly. Median valuesare given in parentheses. Even in the cases where thereare significant differences, however, the differences inmedian are quite small - on the order of 1 dis/min perkilogram of tissue - so they may not be of any practicalsignificance.

For example,

Page 120: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

113

Tissues

Kidney, vertebrae,female gonad, spleen,all 1967-1968 tissues

Liver

Lymph node

Rib

Male gonad

Thyroid

Lung

Interpretation

No significant differences

LA,NM,GA not significantlydifferent; IL, PA, CO notsignificantly different;LA, NM, GA significantlygreater than IL, PA, CO

NM, LA, CO not significantlydifferent; PA significantlylower than NM, LA, CO

LA, NM not significantlydifferent; PA significantlylower than LA,NM

LA, PA not significantlydifferent; GA, CO, NM notsignificantly different; LA,PA significantly greaterthanGA,CO,NM

LA, PA, CO, IL not significantlydifferent; GA, NM notsignificantly different; LA,PA, CO, IL significantlygreater than GA, NM

IL significantly lower thanother tissues; remainingtissues divide into twogroups: NM, LA, GA, CO on thehigh end and GA, CO, PA on thelow end;GA and CO members ofboth groups

Table 9. Medians adjusted for age effects

1974-75KidneyLiverLungLymph NodeRibVertebraeFemale SonadMale GonadSpleenThyroid

1967-58LiverLungVertebrae

1argest -

PALANMNMLANMCOLALALA

LALANM

LANMLALANMCOLAPAPAPA

NMNMNY

GAGAGACOPAGAPAGAGACO

NYNYLA

COILCOPA

PA

CONMIL

NMPAPA

LA

NMCOGA

- smallest

COIL

NM

Page 121: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

114

Table 10. Results of Kruskal-Wallis testsfor geographic differences

1974-1975

KidneyLiverLungLymph nodeRibVertebraeI-emale gonadMale gonadSpleenThyroid

1967-1968

LiverLungVertebrae

p-value

0.57640.00000.00000.02470.00720.05600.95070.00770.09690.0110

0.80010.12770.1202

RELATIONSHIPS BETWEEN LIVERCONCENTRATION AND CONCENTRATION

OF OTHER TISSUES OF THE SAME INDIVIDUAL

We wish to investigate the relationship betweenplutonium concentration in the liver and plutoniumconcentration in selected other tissues (lung, vertebrae,gonad) of the same individual.

Combining the data for all geogr^jiic regions, weselected those nonoccupationally exposed individualswho had measurements for both liver and the relatedtissue in question.

For each of the three selected related tissues, we ran alinear regression of the related tissue concentrations onliver concentration. The results, for males only, areshown in Table 12.

We conclude that knowledge of liver concentration isof little use in predicting the concentration in othertissues in the same individual. The explanation for the

Table 11. Results of hypothesis testing for geographic differences

largest smallest

(unadjusted medians in parentheses)

PAf.114) LA(.IOS) SA(.O75) C0(.Q81) NM(.O63)

LA(2.399) NM(2.123) GA(1.942) IL(1.451) PAfl.398) C0(l.276)

NMf.535) LA(.447) GA(.315) CQ(.3O1) PA(.?71) IL(.1O4)

WA-WlJM^-AKlJOlZ.W) PA( 1.923)U( 1.125) NM(.965) PA{.460)

NM'.5?3) C0f..631)__ GAC..4QQL.PA(.363) LA(.213)

C0(2.769) '.A(0.S57) PA(l.OOO)

^AL^Al-lA' •3191 GA^._Q42_)_.C0(.0Sl) NM(.O63)

LA(.35O) PA(.164) GA(.16O) NM(.147) CO(.lOl)

__JL (. 286) GA(-.194)NM(0.00)

' . A ( l .

LA( l .

NM(4.

823)

272)

557

N M ( 1 .

N M f l .

N Y ( 1 .

730)

165)

539)

NY(1.

NY{0.

LA(0.

500)

558)

759)

Page 122: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

IIS

Table 12. Linear regression of related tissue concentrations

Lung Vertebrae Gonad

Number of observationsInterceptSlopeCorrelation coefficientR2

7120.6180.0740.10.01°

3521.12

-0.021-0.02

O.tOO4c

1990.912

-0.04-0.045

0.002"

"Amount of variability in the related tissue com en'rationexplained by the regression on liver concentration.

lack of relationship is that both the liver tissue and thelung tissue concentrations, for example, are changingwith time and age but at vastly different rates (one isincreasing and the other decreasing). It is, therefore,mathematically impossible for these ratios to be con-stant.

REFERENCES

1. J. F. Mclnroy, E. E. Campbell, W. D. Moss, G. L. Tietjen, B. C. Eutsler, and H. A. Boyd, "Plutonium in AutopsyTissue," Health Physics (to appear).

2. G. L. Tietjen, and R. H. Moore, "Some Grubbs-Type Staiistics for the Detection of Several Outliers,"Technometrics 14: 583-97(1972).

3. S. S. Shapiro and M. B. Wilk, "An Analysis of Variance Test for Normality (complete samples)." Biometrika 52:591-611 (1975).

4. F. A. Lin and J. K. Haseman, "An Evaluation of some Nonparametric Multiple Comparison Procedures by MonteCarlo Methods," Commun. Stat.-Simulation Comput. B7(2): 117-28 (1978).

5. W. J. Conover, PracticalNonparametric Statistics, John Wiley, New York, 1971.

Page 123: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Mortality Among Men Employed (Between 1943 and 1947) at aUranium-Processing Plant*P. P. Polednak and E. L. Frome

Oak Ridge Associated UniversitiesOak Ridge, Tennessee

ABSTRACT

Mortality is described in a cohort of 19.S0O white males who were employed at the TennesseeKastman Corporation in Oak Ridge. Tennessee (in the period between 1943 and 1947). where uraniumwas enriched by an electromagnetic separation process. As of January I, 1974. some 5500 men haddied, according to records of the Social Security Administration; copies of death certificates wereobtained for most decedents. Expected numbers of deaths from specific causes were calculated on thebasis of person-years of follow-up from year of first employment and applying death rates for U.S.white males specific for age (five-year groups), cause of death, and calendar year (five-year intervals).

The standardized mortality ratio (SMRl for all causes of death was less than 1.00 (i.e.. 0.88). andfew of the SMR's for 59 causes or groups of causes were significantly different from 1.00. Mortalityfrom cancer, especially cancer of the lung, was of particular interest since uranium dust was the majorpotential health hazard. Some workers in the first stage of the electromagnetic separation process wereexposed to average air levels of uranium dust that were several times past our present maximumpermissible levels. There was no evidence for significantly higher mortality from lung cancer or othercancers in comparisons by subgroups based on departments of employment (corresponding to re-ported average levels of uranium dust) or length of employment in those departments.

lollow-up of this cohort is continuing, and analyses of mortality until the end of 1978 (i.e.. 31 to35 years after employment) are in progress. Various methods of follow-up and ascertainment of deathsare being utilized.

INTRODUCTION appear, (2) the availability of data on average air levelsof uranium dust showing high levels in certain depart-

The Tennessee Eastman Corporation (TEC) operated m e n t s a t the plant. (3) the large sample sizes available,the Y-12 facility from 1943-1947 and was engaged in a n d ( 4 ) t h e p a u c i t y o f d a t a o n t h e poss ible long-termthe production of uranium chloride from uranium oxide h e a l t h effects of uranium exposure in human popula-and its enrichment by an electromagnetic separation tions.process. The uranium enriched in 2 3 SU was sent to LosAlamos for atomic weapons production. This cohortvas selected for analysis of mortality for several rea- MATERIALS AND METHODSsons, ir jluding (1) the long period of follow-up so that _

* , % ,. .. . u . A t o t a l of about 45,000 persons worked at TECany effects of radiation exposure may have begun to ' \

J between 1943 and 1947; preliminary analyses have been— limited to those who did not remain at the Y-12 plant

•The submitted manuscript has been authored by a contrac- when Union Carbide assumed operation in 1947 and didtor of the U.S. Government under Contract No. DE-AC05- n Q , W Q r k a{ Q t h e r O a k R i d n u d M r f a c m

76R00033. Accordingly, the U.S. Government retains a . .__ e ' , o rw» i i. ." " . „ f7' n,,»,K«h „, rpnr«Hn«. tiw About 47% of these 38,000 workers were women, but

nonexclusive, royalty-free license to publish or reproduce the ' 'published form of this contribution, or allow others to do so, preliminary analyses have focused on male workers be-for U. S. Government purposes. cause the method used to ascertain deaths - that is,

116

Page 124: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

117

searches by the Social Security Administration (SSA) -is less complete for females.

Excluded from these preliminary analyses were men:(1) who worked for less than two days at TEC; (2)whose name and Social Security number in companyrecords did not match those in SSA records, and hencewhose death could not be ascertained; and (3) who hadmissing or erroneous data (i.e., sex, race, birth year, orhire year) required for analysis of mortality. Thisyielded a sample of 18,869 white males (Table 1).

Causes of death were coded (according to the EighthRevision of the International Classification of Diseases)from death certificates which were obtained for almostall persons reported as dead by the SSA. In some anal-yses, expected numbers of deaths were obtained by useof a procedure [1] that applies death rates for U.S .white males, specific for age and calendar year (five-yearintervals), to person-years of follow-up (i.e., from yearof first employment at TEC until death or the end of1973). Cause-specific standardized mortality ratios

Table 1. Mortality from selected cwiet of death among white male* whoworked at Tennessee-Eastman Corporation (1943-1947)

N = 18,869

Selected Causes of deathObservedNumber

ExpectedNumber* SMR"

All causes

All infective and parasitic

All cancers

Buccal cavity and pharynx

Digestive organs, peritoneum

Esophagus

Stomach

Large intestine

Rectum

Liver

Pancreas

Respiratory system

Larynx

Lung

Bone

Skin

Prostate

Testes

Bladder

Kidney

Eye (1950+ only)

5394

96

886

29

210

16

53

47

13

14

57

339

12

324

6

17

49

4

26

20

1

5771.82

131.40

1042.00

36.54

324.67

25.17

72.33

93.21

40.67

24.66

59.23

320.50

17.10

296.47

6.68

17.85

60.71

7.29

32.32

26.54

1.11

0.92

0.73

0.85

0.79

0.65

0.64

0.73

0.50

0.32

0.57

0.96

1.06

0.70

1.09

0.90

0.95

0.81

0.55

0.80

0.75

0.90

Page 125: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

118

Table 1 (continued)

Selected Causes of Death

Brain and C.N.S.

Thyroid (1950+ only)

Lymphosarcoma, reticulosarcoma(1950+ only)

Hodgkin's disease

Leukemia, aleukemia

Other lymphatic tissue

Benign neoplasms

Allergic, endocrine, metabolicnutritional diseases (1950+ only)

Diseases of blood, blood-forming organs

Mental, psychoneurotic, personalitydisorders (1950+ only)

Diseases of nervous system

Diseases of Circulatory system

Diseases of respiratory system

Pneumonia

Emphysema

Asthma

Diseases of digestive system

Diseases of genito-urinary system

Diseases of skin, cellular tissue

Diseases of bone, organs of movement

Symptoms, senility, ill-defined conditions

All external causes of death

Motor vehicle accidents

Unknown cause (death certificate not yetobtained)

ObservedNuabera

32

0

17

9

40

11

15

56

8

33

38

2571

340

113

100

17

228

63

3

11

151

623

208

261

ExpectedNumber

33.83

2.44

25.39

16.38

43.57

19.27

16.39

86.07

15.72

24.24

49.30

3035.98

310.11

121.67

85.89

17.83

297.91

99.04

4.05

8.99

53.89

571.77

182.31

SMRb

0.95

0.00

0.67

0.55

0.92

0.57

0.92

0.65

0.51

1.36

0.77

0.85

1.10

0.93

1.16

0.95

0.77

0.64

0.74

1.22

2.80

1.09

1.14

Expected numbers of deaths are based on person-years of follow-up (fromyear of hire to death or January 1, 1974) and death rates for U.S. white•ales (see text).

Observed number of deaths * expected number of deaths.

Page 126: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

119

(SMRs) were compared across subgroups defined, priorto analysis, on the basis of average levels of uraniumdust. A summary chi-square test [2] was used to testthe statistical significance of the difference between ob-served and expected number of deaths; 95% confidenceintervals were calculated for each SMR using the Pois-son assumption or the normal approximation dependingupon the number of deaths observed [3 ] .

Lranium Exposure Data

Film badges were not worn by workers at TEC, butexposure levels to penetrating external radiation werelow due to the nature of the operation. Air-samplingdata are useful in approximating the average exposureof a group of workers performing the same repetitivetask or operation [4 ] . Subgroups of TEC workers havebeen defined on the basis of reported average levels(/ig/m3) of uranium dust. Briefly, workers in bothstages of the electromagnetic separations process(known as "alpha" and "beta") were exposed to appre-ciable average air levels of uranium dust, hi the alpha-stage chemical departments where UO3 was converted toUC14, average uranium air levels were several timesgreater than the then current standard (ISO /ig/m3).Uranium levels in the urine of over 1000 chemical work-ers were also determined at TEC by a fluorescentmethod, but results are not reported here.

Subgroups of workers (Table 2) have been definedlargely on the basis of these average levels of uraniumdust. Because the electromagnetic-separation processwas carried out in separate buildings requiring specialsecurity clearance for entry, workers in other depart-ments and buildings (e.g., cafeteria, offices, industrialshops) were not exposed to uranium dust. A subgroupof electrical workers has been defined, however, sincethese men repaired equipment in the alpha and betadepartments.

Thus, the hypothesis to be tested was whether or notworking in areas in which the average air levels of ura-nium dust were high (near, at, or above maximum per-missible levels) was associated with increased mortalityover a period of 25 to 30 years after employment.

RESULTS

Standardized Mortality Ratios

Table 1 shows the mortality experience as of the endof 1973 with observed and expected deaths from se-lected causes for all white males. Copies of death certifi-cates were obtained for 5133 of the 5394 men (95.2%)reported as dead; all deaths were included in the "all

causes" category (Table 3), but no correction of cause-specific SMRs was made for deaths from unknowncause.

The ratio of observed to expected deaths (or SMR)for all white males (N= 18,869) was less than 1.00 forall causes and for all cancers. Few of the SMRs werestatistically significant. On the basis of available data onuranium metabolism and toxicity in man and otheranimals [ 5 , 6 ] , some diseases of particular interest arelung and bone cancer, as well as diseases of the respira-tory system and genito-urinary system (because of thehigh dust concentrations and possible chemical effectsof uranium on the kidney). SMRs for lung cancer andbone cancer were 1.09 (C.97 - 1.22, 95% confidenceinterval) and 0.90 (0.33 - 1.96,95% confidence inter-val) respectively.

The mortality ratio was high for the category"symptoms, senility, and ill-defined conditions," butthis is comprised predominantly of deaths for which thedeath certificate stated "unknown cause." Also, thedeath rate for this cause category for Tennessee (from1940-1960) was about four times the rate for theUnited States [7 ] . Low SMRs, such as those for diseasesof the circulatory system and infectious disease, mayrepresent the "healthy-worker" effect evident in mostoccupational cohorts.

The total cohort (Table 1) includes many workerswho were probably not exposed to uranium dust atTEC. Table 2 shows SMRs by subgroup based on de-partments of employment. If a worker was ever em-ployed in an alpha or beta department, he ."as assignedto that group; if not, he was assigned to tins "electricalworker" group or to the "other" group. Within thegroup of "all alpha and beta department," a subgroupof chemistry workers was defined; this subgroup wasexposed to the highest average air levels of uranium dustas noted above.

SMRs were generally similar across the groups(Table 2). The "alpha and beta" group and the sub-group of chemical workers did not have higher SMRsthan the other groups for all causes, all cancers, lungcancer, or leukemia. The SMR for bone cancer in thechemical workers (1.84) was based on only one ob-served death.

The possible effect of length of employment on mor-tality was also considered for the group (alpha and betaworkers) exposed to uranium dust. For men who everworked in an alpha or beta department, mortality ratiosfor selected causes were calculated for those whoworked for less than one year or longer at TEC (Ta-ble 3). The SMRs for the two groups were similar formost causes compared; the SMR for lung cancer was

Page 127: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Table 2. Mortality from selected causes of death among white males who workedat Tennessee-Eastman Corporation (1943-1947), by subgroups based on department of employment

Selected Causes of Death

All CausesAll cancers

Buccal cavity, pharynxDigestive systemRespiratory system

LungBoneSkinHodgkin's diseaseLeukemia, aleukemia

Diseases of circulatorysystem

Diseases of respiratorysystem

Diseases of genitourinarysystem

Unknown cause(death certificate notyet obtained)

Mean age at entryMean year of entryTotal person-years of follow-up

Alpha and1 BetaChemistry

(N - 2051)

N

3776341424231113

176

14

419(5.0%)a

29.11945.155,447

SMR

0.840.781.400.600.940.971.621.620.600.84

0.80

0.62

0.55

All Alpha andBetaments

(N -

N

1963335148012311627411

908

118

1982

Depart-

8345)

SMR

0.900.850.990.680.980.990.780.970.570.65

0.83

1.05

0.54

(4.2%)

31.21944223,'

.8607

ElectricalWorkers3

(N » 1172)

N

3515711127260202

181

20

420(5.7%)

26.11944.630,642

SMR

1.020.910.440.571.361.42[0.40]1.84[1.01]0.77

1,02

1.11

0.69

All Other

(N -

N

30804941411918918248527

1482

202

40159(5.2%]

34.71944.240,^

9352)

SMR

0.950.850.700.631.081.131.080.840.601.12

0.84

1.13

0.69

1

.7»94

Total

(N - 18,

N

539488629210339324617940

2751

340

63261(4.8%)

33.11944.7494,742

869)

SMR

0.930.850.790.651.061.090.900.950.550.92

0.85

1.10

0.64

©

Note: Where the observed number of deaths is zero, the expected number of deaths appears in brackets in the "SMR" column.

a Percentage of deaths ("all causes") for which a death certificate has not yet been obtained.

Page 128: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

121

Table 3. Mortality from selected causes of death among white males who worked atTennessee-Eastman Corporation (1943-1947) in selected departments involving highest

uranium dust exposure, by length of employment

Selected Causes of Death

All Causes

All Cancers

Digestive organs

Respiratory System

Lung

Bone

Leukemia, aleukemia

Diseases of Circulatory System

Diseases of Respiratory System

Unknown Cause

Unknown Cause

Mean age at entry

Mean year of entry

Total person-years of follow-up

< 1

(N =

Obs.No.

999

158

41

55

50

1

2

454

61

48

(4.8Z)a

30.2

1945

115,

Length of

year

4337)

SMR

0.99

0.86

0.76

0.94

0.92

0.82

0.25

0.90

1.19

.1

020

Employment

> 1 year

(N = 4008)

Obs.No.

964

177

39

68

66

1

9

454

57

34

(3.5%)

32.3

1944.5

108,592

SMR

0.83

0.84

0.61

1.01

1.06

0.74

1.02

0.76

0.93

Percentage of deaths (all causes) for which a death certificate has notyet been obtained.

slightly higher in the >]-year group (i.e., 1.06 vs 0.92).The SMR for leukemia was also higher in the >l-yeargroup (i.e., 1.02 with a 95% confidence interval of 0.47- 2.01), but the SMR in the <l-year group was verylow (0.25; 0.03 - 0.90, 95% confidence interval) andbased on only two observed deaths. The proportion ofdeaths for which a death certificate had not been ob-tained was higher in the <l-year than in the >l-yeargroup.

Logistic Regression Analysis

To evaluate the effect of exposure to uranium dust,individuals were considered "exposed" if employed as

alpha, beta, or electrical workers; all others were "non-exposed." Individuals were then placed in one of fourage groups based on their age in 1945; and since allworkers were employed during a short time interval(1943-1947), this variable represents age at exposure.The following logistic regression function - see [8,9]- is used to describe the effect of age and exposure onthe probability of death (on or before January 1,1974):

(1)

where / = 1 for exposed ana / = 0 for nonexposed. Thevector z is used to model the effect of the covariate

Page 129: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

122

"age," and the dependent variable d = 1 if an individualshowed the response of interest and d = 0 otherwise.Table 4 shows the number of individuals at risk in therth group («/), the number that respond (rf), and thevalues of ft and z/ for the z'th group. The dependentvariable r,- is assumed to follow the binomial distribu-tion with expected value

(2)

and the z's are defined so that ft represents the prob-ability of response in the control group with age <25.The data in Table 4 corresponds to fcur 2 X 2 tables,and for each such table the odds ratio is given by

The empirical logistic transformation [8] is given by

(4)

and the linear logist regression equation is

n,- = log[P(d = 1 | / , z)/P(d = 0 1 / , z)]

(5)

that is, the model is linear after the logist transforma-tion. From Eqs. (3) and (5), it follows that the log oddsratio is

logp(z) = (6)

and, consequently, if the "interaction" parameters arenot required in the model (i.e., 6 = 0), then the odds

Table 4. Tennessee-Eastman Corporation white males, by age and exposure group

Age In1945

< 25

25-34

35-44

45 +

i

12

34

56

78

ri

220262

602639

937782

1327635

"i

19492553

28883673

24702326

2045965

Pi

.1129

.1026

,2084.1740

.3794

.3362

.6489

.6580

fi

01

01

01

01

(zll

00

11

00

00

Z12

00

00

11

00

Z13 )

00

00

00

11

r±.1149.1010

.2016

.1793

.3745

.3413

.6624

.6294

'11

zi2

Zi3

Pi

number at risk in group i

: number of deaths (before January 1, 1974) in group i

proportion of deaths in i th group

0 (control) or 1 (exposed)

1 if age group » 25-34 and 0 otherwise

1 if age group * 35-44 and 0 otherwise

1 if age group » 45+ and 0 otherwise

fitted values obtained under model (9) using ML estimatesgiven in Exhibit 2.

Page 130: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

123

ratio is a constant e^. In the data analysis that follows,we fit the following sequence of models:

Constant

Age effects

Exposure + age

Complete

(7)

III =ju + o'z,-

Maximum likelihood estimates were obtained using theiterative weighted least squares procedure described byChames, Frome, and Yu [10]. Initial estimates are com-puted using least squares on the logits defined inEq. (5). The binomial-ANOVA procedure described byEfron [11] summarizes the results of the analysis, andminus twice the log likelihood function will be used asthe measure of "residual variation," that is,

D(h) = - exp(fl,)]

- (« , - - r()log(l - (8)

A

Note that Z>(fl) has been defined relative to the "com-plete" model in Eq. (7) where fi, = log[/y(l - />,)]and, consequently, D(P) = 0.0. The value of D(ft) foreach of the models in Eq. (7) for the data in Table 4 aregiven in Table 5. The decrease in residual variation ob-tained by increasing the number of parameters in themodel is obtained by subtracting successive values ofthe residual variation. For example, to test for the ef-fect of exposure after age has been included in themodel, we obtain

= 16.31.

This likelihood ratio test statistic has an asymptotic chi-square distribution with one degree of freedom if 0 = 0and is significant at the 0.001 level. The model of pri-mary interest in the present situation is

(9)

since under the hypothesis 8 = 0, the odds ratio e^ isconstant. If this hypothesis is rejected, then the effectof the exposure variable is not constant over the agegroups. Consequently, the last row in Table 5 is called a"lack of fit" test for the model, Eq. (9), and is equiva-lent to applying the Mantel-Haenszel test for homo-geneity [2] to the 2 X 2 tables under study The maxi-mum likelihood estimates for the parameter in Eq. (9)for the "all causes" data in Table 4 are shown at thebottom of Table 5.

Table 6 shows the number of deaths due to disease ofthe circulatory system (CS), all malignant neoplasms(MN), and lung cancers (LC) for each of the age-exposure groups of white males employed at TEC. Re-sults of fitting the sequence of models in Eq. (7) aresummarized in Table 7. Estimates of the parameters inmodel (9) and their estimated standard errors are givenin Table 8. For CS and MN, model (9) "explains" thedata, and estimated odds ratios are 0.939 and 0.971 re-spectively. An approximate 95% confidence interval forCS odds ratio is (0.858, 1.028) and (0.845, 1.116) forthe MN odds ratio.

The chi-square test for lack of fit is significant at the0.01 level (P = 0.0085) for the LC data. Most of the"lack of fit" results from the oldest age group (45+

Table 5. Binomial ANOVA for Tennessee-Eastman Corporation mortality data

Logistic RegressionModel

V

\i + a'z

v + o'z + 6f

V + a'z + Bf + ffi'z

Parameter

ML Estimate

Standard Deviation of

Number ofParameters

1

4

5

8

Estimate

D(n)

3046.72

23.63

7.32

0.00

P

-2.041

.052

Source

Age

Exposure

Lack of fit

al a2

.665 1.528

.058 .057

d.f.

3

1

3

a3

2.175

.062

Variation

3023.09

16.31

7.32

6

-.145

.036

Page 131: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

124

Table 6. Cause-specific mortality for Tennessee-Eastman Corporation white males by age andexposure group

Age In1945

< 45

25-34

35-44

45 >

CE

CE

CE

CE

CirculatorySystemCS

6386

234259

444401

746344

MalignantNeoplasms

MN

2631

98119

155142

216102

LungCancerLC

135

4543

6655

5740

Table 7. Binomial ANOVA for cause-specific mortality

Variation for

Source

Age

Exposure

Lack of fit

df

3

1

3

CS

1830.

1.

1.

49

82

39

MN

391.67

0.17

0.14

LC

121.21

0.51

11.68

. 0 0 8 5

Table 8. Maximum likelihood estimates for cause-specific mortality

Parameter CS MN LC

a l

-3.334(.087)

.864(.096)

1.827(.092)

2.793(.092)

-0.063(.046)

-4.340(.139)

.981(.150)

1.636(.146)

2.213(.147)

-0.029(.071)

-5.471(.244)

1.219(.259)

1.856(.254)

2.095(.259)

-0.083(.155)

Note: Estimates of standard deviation of ML estimate arein parentheses.

Page 132: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

125

years). If we include one interaction term in model (9)for the 45+ age group, we obtain

The chi-square test statistic for lack of fit is 4.32 withtwo degrees of freedom (P= 0.12). ML estimates of theparameters in Eq. (10) for the LC data are given inTable 9, and fitted values under the models defined byEqs. (9) and (10) are given in the last two columns. TheML estimate of 0 is 0 = 0.2744 with an estimated stan-dard deviation of 0.135. From this, we obtain an esti-mated odds ratio of 0.76 for the exposed group under45 and a 95% confidence interval of (0.583,0.990). Theestimated log odds for the 45+ age group is 0 + 5 =0.4109 with an estimated standard deviation of 0.2101.This yields 1.51 for the estimated odds ratio of the 45+group and the 95% confidence interval is (0.999,2.277). This ad hoc analysis suggests a possible increasein lung cancer in exposed workers in the 45+ age group.

DISCUSSION

These data on mortality up to the end of 1973 or 25to 30 years after employment in a uranium processingplant show little evidence to support the hypothesisthat working in areas with high average levels of ura-nium dust has influenced subsequent mortality. Mor-tality ratios (SMRs) for cancers of lung and bone, leuke-mia, and diseases of the respiratory system andgenitourinary system were of particular interest withrespect to possible effects of uranium exposure. SMRsfor the causes of death were not statistically significant(Table 1) and did not tend to be higher in the groupsexposed to the highest average levels of uranium dust(Table 2).

The mortality ratios reported in this study should beinterpreted with caution. First, many of these personsspent most of their lives in Tennessee, and death ratesfor certain cancer sites are lower in Tennessee than inthe United States. Ratios of age-standardized mortalityrates for white males in 1950-1969 (i.e., the rate forTennessee divided by the rate for the United States)

Table 9. Logistic regression analysis of lung-cancer mortality for Tennessee-Eastman Corporation white males

Age In1945

< 25

25-34

35-44

45+

ML

CE

CE

CE

CE

Estimates

Deaths Due to Number atLung Cancer Risk

ri

135

4543

6655

5740

for Equation

Parameter

Vala2

6

ni

19492553

28883673

24702326

204596.5

(10):

Estimate

-5.3721.2181.8421.820

-0.2740.685

Observed

Pi

6.72.0

15.611.7

26.723.6

27.941.5

StandardDeviation

.246

.259

.185

.280

.135

.250

Deaths Per 1000Expected

Eq.(9)

4.23.9

14.012.9

26.224.2

33.030.5

Ea.(10)

4.63.5

15.511.8

28.521.8

27.941.5

Page 133: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

126

were 0.67 for cancer of the colon and 0.49 for cancer ofthe rectum; ratios for lung cancer (0.88) and for allcancers (0.84), however, were closer to 1.00 [12].

Second, some correction should also be made for (a)deaths for which death certificates have not yet beenobtained (these are included in the "all causes" categoryand (b) incomplete ascertainment of deaths by the SSAsystem. An estimate of the latter error is being obtainedby more intensive follow-up of samples of TEC workersto determine the number of deaths occurring amongpersons not reported as dead by the SSA.

Use of internal comparison groups should help toovercome the problem of incomplete ascertainment ofdeaths, provided that there is no bias in cause of deathamong nonascertained deaths across the subgroups com-pared. One statistically significant result of interest wasobtained from the logistic regression analysis. For lungcancer mortality, a possible "interaction" between ageat hire (hence, age at "exposure" to uranium dust) andexposure status was obtained (see Table 7). A singleinteraction parameter for the 45+ age group was addedto the model, and the resulting analysis (see Table 9)suggests that the exposed-control odds ratio is higherfor lung cancer in the 45+ age group. This possible in-creased risk of lung cancer in the older "exposed"workers will be explored in more detail in subsequentanalyses.

Follow-up of the mortality experience of the entirecohort is being up-dated to include deaths occurring to

the end of 1978. This will provide 30 to 35 years offollow-up and allow further examination of long-termeffects such as lung cancer. A case-control study designto evaluate variables such as smoking, uranium dust ex-posure (by using job titles and location of work, alongwith air-sampling data), and medical history is also beingconsidered.

ACKNOWLEDGMENTS

This report is based on work performed under Con-tract No. DE-AC05-760R00033 between the U.S.Department of Energy, Office of Health and Environ-mental Research, and Oak Ridge Associated Universi-ties.

Richard R. Monson, MJD., of the Harvard School ofPublic Health provided a computer program usingabstracted U.S. death rates and person-years tabulationappropriate for cohort studies.

Support in computer programming was provided byD. R. Hudson and M. S. Hansard at Oak Ridge Asso-ciated Universities and by the Computer Sciences Divi-sion at Union Carbide Corporation, Oak Ridge, Tennes-see.

Certain data used in this paper were derived frominformation furnished by the Social Security Adminis-tration. The authors assume full responsibility for theanalyses and interpretation of the data.

REFERENCES

1. R. R. Monson, "Analysis of Relative Survival and Proportional Mortality," Comput. Biomed. Res. 7: 325-32(1974).

2. N. Mantel and W. Haenszel, "Statistical Aspects of the Analysis of Data from Retrospective Studies of Disease,"/. Natl Cancer Inst. 22: 719-48 (1959).

3. W. Haenszel, D. B. Lowland, and M. G. Sirken, "Lung Cancer Mortality as Related to Residence and SmokingHistories. I. White Males,"/. Natl. Cancer Inst. 28: 947-1001 (1962).

4. M. Eisenbud and J. A. Quigley, "Industrial Hygiene of Uranium Processing," Arch. Indust. Health 14: 12-22(1956).

5. J. B. Hursh and N. L. Spoor, "Data on Man," in Uranium, Plutonium, TransphttonicElements, H.C.Hodge,J. N. Stannard, and J. B. Hursh, Eds., Springer-Verlag, New York, 1973.

6. P. W. Durbin and M. E. Wrenn, "Metabolism and Effects of Uranium in Animals," pp. 67-129 in Proceedings,Conference on Occupational Health Experience with Uranium, Stouffer's National Center Inn, Arlington,Virginia, April28-30,1975, ERDA 93.

7. R. D. Grove and A. M. Hetzel, Vital Statistics Rates in the United States 1940-1960, U.S. Public HealthService Publication No. 1677, Washington, D. C, U. S. Government Printing Office, 1968.

8. D. R. Cox, The Analysis of Binary Data, Methuen, London, 1970.9. N. Breslow and W. Powers, "Are There Two Logistic Regressions for Retrospective Studies?" Biometrics 34:

100-105 (1978).

Page 134: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

127

10. A. Charnes, E. L. Frome, and P. L. Yu, "The Equivalence of Generalized Least Squares and Maximum Likeli-hood Estimation in the Exponential Family,"/ Am. Stat. Assoc. 71: 169-72 (1976).

11. B. Efron, "Regression and ANOVA with Zeio-One Data: Measures of Residual Variation," / Am. Stat. Assoc73: 113-21(1978).

12. T. J. Mason and F. W. McKay, U. S. Cancer Mortality by County, U. S. Department of HEW Publication No.74-615,Washington,D.C., U.S.Government Printing Office, 1974.

Page 135: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

A Critique of Person-Rem-Years as an Estimator of Risk*from Ionizing Radiation

Peter G. GroerOak Ridge Associated Universities

Institute for Energy AnalysisOak Ridge, Tennessee

ABSTRACT

It is customary to express the so-called absolute risk from different types of ionizing radiation asthe number of excess cancers per person-rem-year. It is obvious that this estimator for the cancer ratedepends linearly on time and radiation dose. I examine the circumstances under which the assumptionof a linear dependence on time and dose is justified and show that such dependence is a questionableassumption for the radiogenic cancer rate in humans and in animals. Different statistical methods(hazard plotting and total time on test) for the estimation of the cancer rate and for the nonpara-metric characterization of the tumor appearance time distribution are discussed. An analysis of thedata on the effects of 239Pu in beagles and 226Ra in man illustrates the usefulness of these methods.

INTRODUCTION

In prospective studies of the effects of ionizing radia-tion in man, the "absolute risk" is defined as "the num-ber of excess (radiation-reiated) cases of cancer per unitof time in an exposed population of given size per unitof dose" [1] . It is customary to choose a referencepopulation of one million. Therefore, the absolute riskis usually given in units of l/(106-person-rem-year). Inpractice the calculation of the absolute risk proceeds asfollows [1] :

a,- = estimated absolute risk for the group of indi-viduals exposed to doseZ)/

^(XilPi-YIQVDi (1)= (Xt - YPilQWiDi,

where

Xj = number of cancer deaths in the group exposedto Dj ,

*This report is based on work performed under Contract No.DE-AC05-76ORO0O33 between the Department of Energy,Office of Health and Environmental Research, and Oak RidgeAssociated Universities.

Pi = number of person-years = total time at risk (ontest) in the group exposed to Dj,

V = Number of cancer deaths in the unexposedcontrol group .

Q - numoer of person-years = total time at risk (ontest) in the unexposed control group.

For easy comparison, the notation of [I] was kept;only the subscript i was added. The arbitrary factor 106

for the reference population was dropped. The radiationdose, Dj, was given in rad or rem units.

In the following sections, I will analyze the implicitassumptions that are made in the calculation of a, anddiscuss methods for establishing their validity. First, Iwill concentrate on the implications of Eq. (1) for thedistribution of time to death from cancer, then turn tothe dose dependence of a,-.

IS THE TIME FROM EXPOSURE TO CANCERDIAGNOSIS OR TO DEATH FROM CANCER

EXPONENTIALLY DISTRIBUTED?

The numerator of Eq. ( I ) is simply the estimated dif-ference of the rates of two homogeneous Poisson pro-cesses [2] . If Xj is the rate of the process in the exposed

128

Page 136: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

129

group and 0 is the rale in the control group, thenEq. (1) implies that the time to death from cancer inthe two groups is distributed with two probability den-sity functions (p.d.f.). \ exp( -X//)and/?exp(-/fr) (2).More realistically, the p.d.f.'s should contain a tumorgrowth period, g. This period, during which the cancergrows to a diagnosable size or reaches a diagnosablestage, represents the minimum latent period. It is, ofcourse, analogous to the guarantee period in reliabilityanalysis and can easily be estimated [3] if the p.d.f. forthe exposed group has the form

/tfr) = A,-exp[-*,< (2)

Many statistical tests for the validity of the assumptionsthat the underlying p.d.f. is a one- or two-parameterexponential are discussed in the literature [3-5] . Here Iwill not repeat the description of these procedures butconcentrate instead on two general, nonparametric pro-cedures, hazard [6,7] and total-time-on-test (TTT) plots[8]. Both methods are widely used in reliability analysisbut have been ignored in the analysis of follow-upstudies.

Hazard Plots

To prepare a hazard plot, one estimates the cumula-tive hazard, which is the integral of the hazard rate h(t).

H(t)= j dt'h(t').o

of the underlying distribution function with the fol-lowing estimator [6,7]:

= 2 i in . (3)

The symbol r,- is the reverse rank of the time of deathfrom cancer in the ordered sequence of all times (i.e..withdrawal times and time to death) for one particulargroup. The sum is extended over all times to death fromcancer smaller or equal to t. Because

(4)

for an exponential distribution, a plot of//(?) should bea straight line on linear graph paper.

Examples of hazard plots are shown in Figs. 1 and 2.The data for Fig. 1 came from a follow-up study of

CUMULATIVE HAZARD (%)

Fig. t. Estimated cumulative hazard vs time (from first intake of 2 2 6Ra + 2 2 8Ra to diagnosis of osteosarcoma) in radium dialpainters exposed to more than 230 uCi 2 2 6Ra + 228Ra/kg skeleton.9

Page 137: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

130

ORAU-l-79-91.2

Q

3500

3000

2500

2000

1500

1000

500

1 1 1 1 1 1 1

0 0.25 0.50 0.75 1.00 1.25 1.50 1.75 2.00 2.25 2.50 2.75 3.00 3.25AH(t)

Fig. 2. Estimated cumulative hazard vs time (from injection of 2 3 9 Pu to death from osteosarcoma) in level-2.0 beagles.10

radium dial painters [9]. Exponentiality for this sub-group of radium dial painters is suggested by Fig. 1 andcannot be rejected by statistical tests [5].

Figure 2 shows a hazard plot for the time to deathfrom osteosarcoma in beagles that received injections of2 3 9 Pu [10]. This plot on linear graph paper indicatesthat the underlying distribution has an increasing hazardrate. The "force of mortality" (a synonym for hazardrate) for osteosarcomas increases with time after expo-sure.

An estimator [7] for the variance of H{t) is given by

Z l / r .

Total-Time-on-Test Plots

Barlow and Campo [8] suggested TTT plots to testfor exponentiality. On such a plot the ratioTXX)c)fT(Xjv) is plotted vs k/N, the fraction of itemsfailed

T(X/c) - total time on test up to the kth death fromcancer;

T(XN) = total time on test up to the Nth (i.e., last)death from cancer, assuming N individualsin the group.

Plots for distributions that are increasing hazard rate(IHR) or decreasing hazard rate (DHR) are, respectively,above or below the 45° line, which corresponds to theexponential distribution [8]. Figure 3 shows a TTT plotfor the same data as in Fig. 2. The data points are abovethe 45° line, confirming the earlier diagnosis of IHR.

These two examples demonstrate how easy it is tocheck graphically for an underlying exponential dis-tribution. Despite this fact and the abundance of ana-lytical tests [3-5] for exponentiality, the validity ofthis assumption is never verified in the "radiation risk"literature. For short follow-up periods of several years,this assumption is a good approximation, because theforce of mortality does not change significantly. Forfollow-up periods of two or three decades, however, thisassumption is dubious and should be tested. Doll [11]showed in his classical study that age-specific incidence(i.e., the hazard rate) for many human cancers is de-scribed by h(i) = btk(b>0, k>0). For all cancers,

Page 138: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

131

ORAU-l-79-91.1

0.2 0.4 0.6

Time

Fig. 3. TTT plot for level-2.0 beagles.10

1.0

with the exception of melanoma, k is between 2 and 10.Doll also noted a peaked incidence of leukemia inatomic bomb survivors [11]. This incidence peak is alsoacknowledged in [1] , but the absolute risk estimates forleukemia, based on Eq. (1), ignore any time dependenceoih{t).

Similar evidence comes from a study with beagles. Inthis study, the underlying time distribution for deathfrom osteosarcomas becomes increasingly IHR as theadministered amount of 2 3 9Pu increases [12] (see alsoFigs. 2 and 3).

These facts all suggest that the assumption of a homo-geneous Poisson process is probably unrealistic, butcaution, especially for high doses of radiation, is advis-able, as the example in Fig. 1 shows. Large doses ofradiation may have an "autotherapeutic" effect andcould modify the process of radiation carcinogenesissufficiently to change the class of underlying time dis-tributions.

DOSE DEPENDENCE OF THEABSOLUTE RISK

Until exponentiality is proven, all inferences drawnfrom absolute risk estimates have to be viewed withcaution. In particular, the dose dependence of a,- used in[1], the so-called linear dose-incidence model, shouldnot be considered as supported by data, because theanalysis of the time dependence of radiation car-cinogenesis is suspect. According to this model,

A,- = 0 + oD,-.

The total rate of the carcinogenic process in group A isthe sum of the "background" rate,0, and the radiation-induced rate. The index, i, for a can therefore bedropped. If withdrawal times and time to death (ordiagnosis) are known for each dose group, then eachffj(t) for the group exposed to Dt should be a straightline on linear graph paper, if Eq. (5) holds. The validity

Page 139: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

132

of this model could be checked, for example, for thedata on atomic bomb survivors, but unfortunately thedata are not readily available in such detail. For sometumors and for certain exposures, /? is negligible com-pared to aDj in Eq. (5). Then the slope of Ht{t) shouldbe proportional to dose if Eq. (5) holds.

In the terminology of reliability analysis, this meansthat according to Eq. (5), radiation carcinogenesis ischaracterized by an accelerating function,Djt [13J.

REM OR RAD: THAT IS THE QUESTION

Rad is a purely "physical" unit, determined solely bythe type and energy of the radiation used. Rem, ac-cording to at least one definition [1 ] , is rad times rela-tive biological effectiveness (RBE). The RBE, however,is "the experimentally determined ratio of an absorbeddose of a radiation in question to the absorbed dose of areference radiation required to produce an identicalbiological effect in a particular experimental organismor tissue [ ' ] . Because this ratio is determined fromabsolute risk estimates for the two types of radiation inquestion, the measure person-rem-years is "twice" asdubious as person-rad-years. For this reason, the use ofrad should be preferred to rem.

CONCLUSION

The use of Eq. (1) to estimate absolute risk was criti-cized because the equation contains two unproven as-sumptions of exponentiality and also of an acceleratingfunction linear in dose. The hazard rate for many spon-taneous cancers in humans increases with time Theexample using data from beagles injected with 239Pualso shows an IHR distribution and suggests a hazardfunction proportional to t"(a> 1) for some dose

groups. These facts demand verification of the two as-sumptions before Eq. (1) is used for risk estimation.

Two graphical methods and referenced analyticalmethods that are useful to test these assumptions havebeen discussed. Now I will describe what is assumed ifcumulative hazard or TTT is plotted. These methodsassume that the underlying random variables corre-sponding to withdrawals and competing causes of deathare independent and identically distributed. The inde-pendence assumption has been questioned [14] . -;nd ithas been pointed out that serious errors in estimatingthe net distribution function for one particular cause ofdeath can be made if the risks are assumed to be inde-pendent when, in fact, they are dependent [15] . Fordependent competing risks, bounds for the net distribu-tion functions can be estimated [15] . These bounds arevalid for any type of dependence, and they are far apartif withdrawing and censoring are heavy. It is not knownif the bounds are closer together for the types of depen-dence found in biological systems. It might be possibleto answer this question after "biological" dependenceshave been characterized probabilistically. The analysisof osteosarcoma induction in beagles seems to suggest,however, that estimates of the net distribution functionunder the assumption of independence are not seriouslyin error since the trend of accelerated tumor appearancecontinues as the data become gradually uncensored atthe higher dose levels [12] . The second assumption ofidentical distribution is unverifiaHe at present, and thisauthor does not even know what additional data wouldbe needed to verify this hypothesis. The assigning of aunique cause of death presents an additional difficulty,since it is clear that sometimes this involves the subjec-tive decision of a pathologist. When multiple pathologi-cal findings are present at the time of death, the patho-logist defines the minimum of the random variables forthe statistician.

REFERENCES

1. The Effects on Populations of Exposure to Low Levels of Ionizing Radiation, Report of the Advisory Com-mittee on the Biological Effects of Ionizing Radiation, National Academy of Sciences, Washington, D. C ,1972.

2. D. R. Cox and P. A. W. Lewis, The Statistical Analysis of Series of Events, Wiley, New York, 1966.3. B. Epstein, "Estimation of the Parameters of Two Parameter Exponential Distributions from Censored Sam-

ples," Technometrics 2: 403 (1960).4. B. Epstein and M. Sobel, "Some Theorems Relevant to Life Testing from an Exponential Distribution," Ann.

Math. Stat. 25: 373 (1954).5. B. Epstein, "Tesis for the Validity of the Assumption That the Underlying Distribution of Life is Exponential,"

Part I, Technometrics 2: 83 (1960); Part II, Technometrics 2: 167 (1960).

Page 140: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

133

6. W. Nelson, "Theory and Application of Hazard Plotting for Censored Failure Data," Technometrics 14: 945(1972).

7. 0 . Aalen, "Nonparametric Inference in Connection with Multiple Decrement Models," Scand. J. Slat. 3: 15(1976).

8. R. E. Barlow and R. Campo, "Total Time on Test Processes and Applications to Failure Data Analysis," p. 451in Reliability and Fault Tree Analysis. SI AM. Philadelphia, 1975.

9. P. G. Groer and J. H. Marshall, Radiological and EnvironmentalResearch Division Annual Report, ANL-76-88,Argonne National Laboratory, 1976, Part II, p. 17.

10. W. S. S. Jee, Radiobiology Annual Report, No. 119-253, University of Utah, Salt Lake Gty, Utah, 1978; C. W.Mays, private communication.

11. R. Doll, "The Age Distribution of Cancer: Implications for Models of Carcinogenesis," J. R. Stat. Soc. A, 134:133(1971).

12. P. G. Groer and K. Pitts, Nonparametric Analysis of Osteosarcoma Induction in Beagles," Institute for EnergyAnalysis Report (in preparation), Oak Ridge Associated Universities, Oak Ridge, Tenn.

13. N. R. Mann, R. E. Schafer, and N. D. Singpurwalh,Methods for Statistical Analysis of Reliability and Life Data,Wiley, New York, 1974.

14. A. Tsiatis, "A Nonidentifiability Aspect of the Problem of Competing Risks," Proc. Nat. Acad. ScL 72: 20(1976).

15. A. V. Peterson, "Bounds for a Joint Distribution Function with Fixed Sub-distribution Functions: Applicationto Competing Risks,"Proc. Nat. Acad. ScL 73: 11 (1976).

Page 141: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Workshops

Page 142: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM
Page 143: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Workshop IModel Evaluation

Ronald K. Lohrding, Organizer

SUMMARY

The purpose of this workshop was to provide a forum for considering the problem of model evaluation.The formal presentations included:

1. "Sensitivity Analysis for National Energy Models," Michael D. McKay and Andrew Ford, Los AlamosScientific Laboratory,

2. "Strategies for Model Evaluation," James Gruhl, MIT Energy Laboratory, and

3. "Methodological Issues in Analyzing Automotive Transportation Policy Models: Two Case Studies,"David Roberts, presenting paper for Barbara Richardson, University of Michigan.

These formal presentations were followed by a period of open discussion. The discussion centered around theabove presentations and general model evaluation topics.

Modei evaluation is a complex interdisciplinary problem that involves systems analysts, modelers,engineers, physical scientists, social scientists, and statisticians. It was agreed that any successful evaluation ofa model will require cooperation and expertise to address the problems of sensitivity analyses, assessment ofimportance of variables, determination of variable selection methods, model validation, and modeldocumentation.

Conclusions include the following: (1) statisticians can contribute as members of model evaluation teamsand (2) sensitivity analysis as presented in the first paper is a logical and important aspect of modelevaluation. Further, our goal is to extend development of model evaluation techniques beyond the black boxview of a model.

137

Page 144: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Sensitivity Analysis and a National Energy Model Example*

Michael D. McKay

Los Alamos Scientific LaboratoryLos Alamos, New Mexico

ABSTRACT

Sensitivity analysis, a study of changes in a model output produced by varying model inputs, ismuch more than estimating partial derivatives. As a part of model evaluation, it is an exploratoryprocess directed toward finding out how and why a model responds to different values of inputs.When viewed as a data analysis problem, the intent of sensitivity analysis is to make an inference abouta model based on a sample of observations generated from the space of input values. The validity ofthe inferences is tied closely to the laws, or assumptions, relating the observations (data) and themodel.

INTRODUCTION

This presentation is intended to serve as an introduc-tion to a discussion of sensitivity analysis of complexmodels. By sensitivity analysis, I mean, very generally, astudy of changes in a model output produced by varyingmodel inputs. When viewed as a study of variation,sensitivity analysis is much more than estimating what iscalled a "sensitivity coefficient," the rate of change ofan output with respect to an input. Hence, sensitivityanalysis may not be the right term to use to refer to thestudy. Indeed, statisticians, whose primary function is tostudy and explain variation, are rarely called sensitivityanalysts. Nevertheless, I will use the term sensitivityanalysis in the broad sense.

The definition of sensitivity analysis — a study ofchanges in a model output produced by varying modelinputs — does not immediately lead to a definition ofsensitivity or to definitions of measures of sensitivity.Rather than trying to develop a sensitivity analysismethodology through the definition/theorem approach,I think it is more useful, at least now, to specify

*Report LA-UR-79-2895, Los Alamos Scientific Laboratory,Los Alamos, New Mexico. Work supported by the Departmentof Energy under Contract No. W-740S-eng-36.

objectives of the study - things one would like to findout about the model - and then to look for analysisprocedures (sensitivity measures) that will satisfy theobjectives.

Some of the ideas in this paper have evolved fromwork done for the Nuclear Regulatory Commission andthe Department of Energy in the area of model(computer code) evaluation. We will look at what ismeant by a model, sensitivity analysis as a part of modelevaluation, and a data analysis approach to sensitivityanalysis. Finally, some results of an analysis of theNational Energy Model COAL2 will be presented as anillustration of what can be done in practice.

THOUGHTS ON SENSITIVITY ANALYSIS

A Model

In this discussion, a model is a computer program(code) that transforms a vector X of inputs into anoutput Y(i) which is a parametric function of time.Symbolically, we write Y(t) = h(t; X), where h(') repre-sents the calculations done by the code. The inputs canbe model parameters or problem description variablessuch as initial conditions. The space of input values forwhich the model is intended to perform correctly isdenoted by S, and t is on some closed interval, say[0,71.

139

Page 145: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

140

When models are used to predict what might happen ina real system under some set of conditions — forexample, future energy demands or the response of anuclear reactor to a valve failure - the value of the inputvector X that will give the "correct" or best prediction isusually not known. We will describe this uncertainty inthe input values by a probability function f{x) on theinput space S. Then, a more complete symbolic descrip-tion of a model is

We will assume that/( ' ) is known, although in practicewe may have only incomplete information about it.Often, S is taken to be a Cartesian product of intervals("error bands" on the inputs), and /(•) is a uniformprobability function.

In the following discussions, the parameter t willoccasionally be suppressed in the notation for simplicity.

Model Evaluation and SensitivityAnalysis

Model evaluation is concerned with many facets of theY(t) = h(t;X) relationship, including the logical struc-ture and internal consistency of the computer code, theplausibility of the output for various values of theinputs, and the agreement between the output andexperimental or historical data. Because Y(t) is afunction of time, model evaluation can be performed forboth individual time points and time intervals.

Sensitivity analysis is the part of model evaluation thatstudies the changes in a model output produced byvarying the inputs. As such, it can cover a very broadarea with respect to objectives and techniques. Theobjectives may include determining the rate of change ofthe output with respect to each input, ranking the inputswith respect to their importance, or specifying theproportion of output variability attributable to eachinput.

More simply, one might say that the purpose ofsensitivity analysis is to see how a model responds todifferent values of the inputs. Consequently, one wouldstudy a model with the aim of discovering things aboutit. The key is discovery — of predictable results andunexpected ones, of important inputs and unimportantones, of a wide range of variability in the output or anot-so-wide one.

If we accept the discovery notion, we should viewsensitivity analysis as an exploratory investigation until

we know what things to look for and how to measurethem.

This approach to sensitivity analysis is one cf those astatistician can take when studying experimental data.Hence, by looking at sensitivity analysis as a dataanalysis problem, a statistician might be able to see howto modify and apply statistical analysis methods to thearea of sensitivity.

A Data Analysis Approach to Sensitivity

Unless a model has a known and manageable analyticform that can be manipulated mathematically, sensi-tivity studies will consist of choosing, for example, Nvectors of the input for which values of the output areobtained. The N pairs (Xu YY),..., (XN, Yfj) consti-tute data on which analyses are performed.

As an example, suppose XQ represents the nominalvalues of the individual inputs and Yo is the correspond-ing nominal output value. Let X( be the vector of inputswith all values at their nominal except for component i,which has been perturbed by a small amount dj. Ananalysis could consist of the calculation of estimates ofthe partial derivatives as s,- = (Yj - Y0)jdj. One could usethe number Sj as a measure of the sensitivity of theoutput to input number i.

Viewed as a data analysis problem, the intent ofsensitivity analysis is to make an inference about themodel h(t\ X) based on the observations(Xx, Y{), ..., {XN, YN). Two immediate issues are(1) the selection of input values X\, ..., AJV and(2) thp analyses to be performed.

Input values should be chosen in a manner related toboth the intended use of the model and also theexpected sensitivity analysis procedures. The selection ofthe set of input values may be made by intuitive choice,systematic designs, sampling plans, or combinations ofthese. (The choice of method might be influenced by thecost of the model calculations.) Clearly, the total meritsof the selection procedures cannot be determinedwithout reference to the anticipated analysis.

The analysis of data has as its foundation theprocedure used to select the input values and theobserved variation among the outputs Ylt . . . . Yjy.These two things together constitute the basis forinferences about the model.

Before continuing, something should be said about thedifference between data obtained from a computer codeand experimental/observed data familiar to statisticians.There is no "random error" associated with computercalculations. Given an input vector X*, we assume thatthe calculation h(t;X*) will always produce the same

Page 146: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

141

value Y*(t). This fact must be kept in mind when talkingabout things like the variance of an estimator or a r-test.

In view of the properties of the data for a sensitivitystudy, classical inference (hypothesis testing and estima-tion) must take on a slightly different perspective.Usually, we make inferences about a population usingsome kind of a sample from it. The inferences are basedon assumptions about the population and the sample. Todenote these assumptions, I will use the term "law"rather than the term "model" (as in a linear model) toavoid confusion with the model Y(t) = h(t; X). Gener-ally, laws are assumptions about where the observationscome from and how they are related. Laws need not becomplete in specification [i.e., NQA,O2) with n and a2

unknown], but they must be complete in the sense thatthey allow for legitimate inference.

Laws usually include an explanation (assumed) forvariation in the data — for example, experimental erroror different values of an independent variable. It isrelative to the laws (assumptions) that analysis pro-cedures are judged with respect to things like optimality,smaller variance, and so forth.

In summary, the data analysis approach can be viewedas a two-step process: (1) selecting model input valuesand (2) performing analyses involving the inputs andassociated outputs. The foundation of the analyses is theobserved variation among the output values. Laws(assumptions about the relationship among the inputvalues, the output values, and the model) provide thescope within which inferences about the model aremade. The validity and effectiveness of the inferencesare judged relative to the laws.

Defining Importance

Most of the objectives in the next section include somereference to important inputs (or sensitivity). Hence,some quantifiable meaning should be attached to theterm "importance," if possible. We might say that aninput is important if a change in its value causes asubstantial change in the value of the output. A measureof importance could be the partial derivative (sensitivitycoefficient). This approach to importance is reasonablewhen the relationship between the output and the inputsis linear (i.e., under a "linear model" law). Alternatively,one might look at the variance of the output under thelinear law and use the square of the partial derivativemultiplied by the variance of the input (or the ratio ofthis quantity to the variance of the output) as a measureof importance. I prefer this alternative quantity becauseit combines the rate of change of an input with its rangeof values, and it is independent of the units of the input.

Propagation of error, as the second technique above iscalled, can be applied under the law that a Taylor seriesapproximation of ft(*) in the inputs X is an adequaterepresentation. In the nonlinear case, however, thepartial derivatives are generally nonconstant functions ofthe inputs. Hence, where to evaluate the partials andeven the possibility of using a directional derivative tendto cloud the issue of quantifying importance.

I want to leave the definition of importance as apossible item for discussion in this workshop. Thefollowing points might be kept in mind:

1. There are many ways to measure importance relativeto the observed variation in a sample of outputvariables.

2. The laws under which measures of importance arevalid should be identified.

3. The effect on the measures of violation of the lawsby the model should be known at least qualitatively.

4. Means of detecting violations of laws are needed.

Objectives of Sensitivity Analysis

In this section, I want to list some objectives forperforming a sensitivity analysis. Having stated objec-tives, one can assemble techniques and formulate strate-gies to create a sensitivity analysis methodology.

In sensitivity analysis, one might want to identify

1. important inputs;

2. important subsets of inputs;

3. important pseudo inputs, such as the product of twoinputs;

4. important segments of the range of values of aninput, such as a threshold value for importance;

5. inputs that are conditionally important, such as aninput A that is important when input B is greaterthan 2;

6. unimportant inputs; and

7. unusual results associated with specific inputs (orsubsets thereof) or their values.

Certainly, many more objectives could be stated, andeach model being investigated will present special con-siderations. Even a partial list of objectives, however, canbe used to put together a set of techniques and toformulate a beginning strategy for sensitivity analysis.

A Question for Discussion

What does it mean to say that an input is important orthat the output is sensitive to a particular input?

Page 147: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

142

AN ANALYSIS OF THE COAL2 NATIONALENERGY MODEL

In the following sections, I will summarize some of thetechniques and results from an analysis of the modelCOAL2. This work was performed in collaboration withAndrew Ford from the Energy Systems and EconomicAnalysis Group at the Los Alamos Scientific Laboratory[1,2].

The Model

The model used in our study, COAL2, was developedby Roger F. Naill of the Dartmouth College SystemsDynamic Group under a contract, first from the Na-tional Science Foundation and later from the EnergyResearch and Development Administration. The modelis designed to allow investigators to test a variety ofenergy policies that may affect the nation's ability toreduce its dependence on oil imports during a period inwhich domestic production of oil and gas is on thedecline.

In our study, 72 inputs wt re selected for investigation.Of these, five were assigned discrete probability distribu-tions. The remaining 67 inputs were assigned uniformdistributions on intervals. All inputs were treated ashaving independent distributions.

Thirteen outputs were recorded at 36 successive timepoints. The outputs Gross Energy Demand (GED) andAverage Energy Price (AEP) will be discussed here.

Gathering the Data

The values of the inputs were selected according toLatin hypercube sampling [3] for the continuousvariables. For the sample size of 100, the range of eachinput was divided into 100 equal (probability) lengthintervals. The intervals were sampled according touniform distributions to produce 100 distinct values foreach of the inputs. The values of each input wereassigned at random and without replacement to the 100runs.

For the discrete inputs, the values were assigned atrandom to the runs in proportion to the probabilities ofthe values.

After obtaining the 100 vectors of 72 input values, wemade 100 runs of the COAL2 model. An additional runwith all inputs at their mean value was made to generatethe base case (nominal) output values.

Plots giving 4<;scriptive statistics for the outputs aregiven in Figs. 1 and 2. The statistics were computedindependently at each of the 36 time points. Figures 3and 4 show the output plots obtained for the first fiveruns. These plots typify the data for the two outputs.

LHS woOUTPUT VARIABLE GE0

STATISTICS X «-» j o .

4.0

3.0-

2.0-

1.0-

0.00.0 5.0 KI.O 15.0 20.0 25.0 3O.0 35.0 40.0

YEARS n/m/n. •

Fig. 1. Descriptive statistics for GED.

LHS 100

OUTPUT VARIABLE AEPSTATISTICS X M-' ,.4

MEAN

ST0 MV

NINIMM-

J

0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0YEARS „„„

Fig. 2. Descriptive statistics for AEP.

From this point, our analyses proceed along twoparallel paths: (1) formal calculations of a measure ofsensitivity as a guide for assigning importance to inputsand (2) informal examination of the data to get a feelfor general trends and irregularities. I will talk hereabout the formal study. An informal study is foundelsewhere [2].

Formal Study of the Data

The general approach we use in formal sensitivityanalysis is outlined in Fig. S. An assumption we operateunder is that the determination of important inputsfrom a set of inputs is easier when the number of inputsis small.

Page 148: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

143

1 -

700.0-j

i

- GED RUNS1- 5

FROM LHS

5

100Sampling PlanGenerates

Values of InputsX

Model ProducesValues

of Output

79/IT/0I t S M J»

Fig. 3. Data plots for GED

11 - AEP

..0-j

7.0 -j

RUNS FROM LHS 100

VariableReduction

Derives Subsetsof Inputs

Y(t)

AnalysisProcedures V(t)

Sensit ivity Measures

Fig. 5. A formal sensitivity analysis.

W.O 15.0 20.0 25.0 30.0 35.0 40.0

Pig. 4. Data plots for AEP.

If we let the vector of inputs X represent the set ofinputs used to generate the output values, we begin byforming "candidate" subsets of X denoted by Xc(t). Ateach time t, X is like the variable pool in stepwiseregression, and Xc(i) is the subset of the pool that hasbeen selected for inclusion in the regression.

We use step-up partial rank correlation to enter inputs,one at a time, into the candidate subset. We continue toinclude inputs until ( l)the magnitude of the largestpartial rank correlation is less than a minimum value rs

for selection or (2) the magnitude of the partial rankcorrelation for the last selected variable is greater than amaximum value /y, which measures the sufficiency ofthe linear fit.

The results from using several values of rs ranging from0.5 to 1.0 are examined to see how the candidate subsetsXc{t) change over time. We have found that using the95% critical values from the distribution of the ordinarycorrelation coefficient (with appropriate degrees offreedom) from normal theory produces pleasant results.

The sufficient-fit criterion is used in the stopping rulebecause the number of observations (runs) in our studiesis not always greater than the number of inputs. We haveonly seen this criterion active when the number of runsis less than about one-third the number of inputs. Thevalue we use is rf = 0.98.

There are many things that could be said about usingpartial rank correlation to select the candidate subsetsand about the stopping rules. Certainly, the proceduresare ad hoc and depend heavily on linearity to beeffective. Let me leave this topic open and just mentionthat we look at the difference between results fromranked and unranked data as an indication of non-linearity.

After constructing the candidate subsets at each timepoint, two phenomena are often observed: inputs areselected at isolated time points, and, conversely, inputsare selected at all but a few successive time points.

Page 149: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

144

Because of these phenomena, we make the assumptionthat inputs will not be important at only isolated timepoints. This assumption leads to a refinement procedurefor the candidate subsets. We smooth, or filter, thesubsets by removing inputs that were selected only oncewithin a time interval of width w. Likewise, if inputswere selected at two points not farther apart than w, weinclude them at all time points in between.

The choice of a value for w is critical and requires astudy of the results to find a good one. Too small a valuefor w can produce candidate subsets that vary greatlyover time. Too large a value can cause the size of thesubsets to get very large. The value used will dependgreatly on the modeled event and on the time-step sizerelative to the dynamics of the model. In the C0AL2study, we used w ranging from three to nine years.

Summarizing the first stage of the formal sensitivityanalysis, we

1. select, from the initial set of inputs X, candidatesubsets Fit) at each time point using step-up partialrank correlation; and

2. filter the subsets Xc(t) to include or remove inputsdepending on their occurrence or nonoccurrence inneighboring subsets.

After establishing the candidate subsets, we use themto calculate partial rank correlation coefficients asmeasures of sensitivity. Plots of the determinant of thecorrelation matrix as a function of time are given inFigs. 6 and 7. The plots indicate how well the linear (inranks) law in the inputs fits the data. The value of 0indicates a perfect linear fit.

The plots reflect changes in the candidate subsets overtime and something of the quality of the laws underlyingthe analysis procedures. The smooth behavior for GED iseasily contrasted with the behavior of the determinantfor AEP.

Plots of the partial rank correlation coefficients(PRCCs) are given in Figs. 8 and 9.

For GED, a total of five inputs were selected over the36-year time horizon. Input 6, long-term growth rate,enters at year 2 and continues as a dominant inputthrough year 36. Input 11, table multiplier for thefraction of energy demanded as electricity, is in candi-date subsets until year 26. Changes from the value 0 onthe vertical axis show when inputs enter or leave thesubsets.

LHS tooOUTPUT VARIABLE AEP - RANKS

STATISTICS « IO< l 0 _ ,

S/PC R.S5C.HT.MH5

DEI OfCOR MAI ,

"1

0.0 S.O 10.0 15.0 20.0 25.0 30.0 35.0 40.0£H YEARS n/m/n. Mjtst5*0

Fig. 7. Determinants of correlation matrix for AEP.

STATISTICSJCJC^0

S / K R.OC.tlF.MWS

OET OrCOD HAT -

LHS 100

OUTPUT VARIABLE GEO - RANKS

0.0 5.0 K).O 15.0 20.0 2S.0 30.0 35.0 40.0YEARS rt/a/n. I

Fig. 6. Determinant! of correlation matrix for GED.

LHS 100

OUTPUT VARIABLE GED - RANKSS T A T I S T I C S X 10 » , 0 _

S / P C I H 5 C . H F . M W 5

i-LTCRT :

•-HTCD :

11-fEOEI

20-COGRI

45-FFECT0 . 0 -

100 15.0 20.0 25.0 30.0 35.0 40.0YEARS

Fig. 8. Partial rank correlation coefficients for GED.

Page 150: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

145

LHS 100

OUTPUT VARIABLE AEP - RANKS

STATISTICS X 10 • 1 0 ,

%/K H.95C.W.WW5 iI

H-OCSf I

20-COGRi

27-rOPT0.5 H

0.0

0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0YEARS „/„/;

LHS 100

OUTPUT VARIABLE AEP - RANKS

S T A T I S T I C S X 10 ° t O n

S/PC R.»5C.Mf.*aw5 i

2-TYE4K i

l l - r E D £ T -

13-PMtAT

0.0-

0.5-

0.0 5.0 10.0 15.0 20.0 25.0 30.0 35.0 40.0YEARS

Fig. 9. Partial rank correlation coefficients for AEP.

We interpret the GED results as indicating that input 6is the most important input from year 3 until year 36.Input 11 is important early, and diminishes in impor-tance. After the first 20 years, inputs 6 and 8 are theonly single inputs detectable as important.

A comparison of the PRCCs in Fig. 8 with thesummary statistics in Fig. 1 shows that the size of thecandidate subsets decreases as the variance of the outputincreases over time. One can also see that the analysisand the linear law is less effective toward later years,where more variation is observed in the data. All thingsconsidered, past experience directs us to classify resultsfor GED as reliable.

The results for AEP show a somewhat different side ofpossible outcomes in our sensitivity analyses. Thebehavior of the PRCCs indicates no clear choices forinputs of standout importance. Input 20 dominates at arelatively high level for the first nine years. After thattime, no inputs have a very high partial correlation withAEP. The conclusion from the PRCCs is that no singlevariable stands out as important in a high degree (valueofPRCC).

A comparison of the PRCCs for GED with those ofAEP can lead to the conclusion that the PRCC isincapable of detecting singly important inputs in somecases. Another conclusion, however, is that single vari-ables are not important for AEP, but that combinationsof variables are important, as in interaction in analysis ofvariance. In this problem, there are 2556 distinct pairs ofinputs. I regret to say that I have not pursued this matterfurther.

In closing, I have presented the basics of a formalanalysis procedure we are using. Sometimes it worksvery well; at other times it doesn't. It does, however,offer indications of what variables to look at first whentrying to explain the behavior of a model.

REFERENCES

1. M. D. McKay, F. A. Ford, G. H. Moore, and K. H. Witte, Sensitivity Analysis Techniques: A Case Study, LosAlamos Scientific Laboratory Report La-7549-MS, November 1978.

2. F. A. Ford, G. H. Moore, and M. D. McKay, Sensitivity Analysis of Large Computer Models-A Case Study ofthe C0AL2 National Energy Model, Los Alamos Scientific Laboratory Report LA-7772-MS, April 1979.

3. M. D. McKay, W. J. Conover, and R. J. Beckman, "A Comparison of Three Methods For Selecting Values ofInput Variables in the Analysis of Output From a Computer Code," Technometrics 21: 2 (May 1979).

Page 151: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Strategies for Model Evaluation

James GruhlEnergy Model Analysis Program, Energy Laboratory

Massachusetts Institute of TechnologyCambridge, Massachusetts

ABSTRACT

A short discussion about the importance of procedures for evaluating the accuracy of complexcomputerized policy models, a description of a large number of alternative methods for model assess-ment, and the selection of cost-effective evaluative strategies are presented. Because the assessmentactivities are likely to be severely constrained by available funding, time, and manpower resources,there are potentially great dividends that will result from the careful consideration and selection ofassessment techniques. Strategies for assessment should be c';osen with particular attention directedtoward assessment objectives and model characteristics.

NECESSITIES AND DIFFICULTIESOF EVALUATIONS

Complex computerized models have become a domi-nant means for conducting quantitative research and arebecoming common vehicles for communicating ideas andresearch results. Recently, however, decision makershave experienced rapidly growing personal skepticismsand have identified professional risks associated with theuse of these large computerized models, especially thosemodels that operate far beyond the limits of individualhuman understanding. If the development and use ofthese models is to continue to grow, then there alsomust be a growth of objective techniques for assessingthe quality of the predictions that result from the use ofthese complex models. This paper outlines a number ofthese general assessment techniques that have been de-veloped and used in our major energy model assessmentprojects.

Figure 1 shows a general sequencing of the modelingprocess with construction at the top of the figure and

•This work was principally supported by the Systems Program,Energy Analysis and Environment Division, Electric PowerResearch Institute. Some of these developments resulted fromresearch sponsored by the Office of Applied Analysis, EnergyInformation Administration, U.S. Department of Energy.

application at the bottom. As mentioned, the principalmotivation for constructing models is generally as ameans of holding large amounts of complex interdiscipli-

calibration

CONSTRUCTEDMODEL

application

OBSERVATIONS

-historical or

experimental

data used to

build model

atterns

STRUCTURAL IDEAS

-variables*

forms, and

parameters

-data or controls

intended to vary \ j -

frora run to run

< • >

MODEL- a s used

ilOUTPUTS-infornation of

interest for a

series of model

runs

REALITY

REALITY

^REALITY

not availablefor measurement

Fig. 1. Flow of activities in the development and utiization ofa model.

146

Page 152: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

147

nary information. This information originates as dataand structural observations of an object or system in-tended to be conveniently reproduced by the model.The lower part of Fig. 1 represents the model's utiliza-tion, with the outputs attempting to represent some un-derstanding of the characteristics, future course, or re-sponse to controls of the original object or system.

The ideal form of model evaluation would involvetaking the outputs from a model and comparing themwith reality. Since most models are intended to be pre-dictive of future events or other effects for which nomeasurement system exists, the ultimate validity com-parison is generally not possible. In Fig. 1 this ultimatemethod of assessment is shown as the dashed arrows onthe lower right of the diagram. Such a comparison ispossible when a model is intended as a simplified charac-terization of a prototype or system that can be checkedwith any number of repeatedly and parametrically de-signed observations. When such a comparison is not pos-sible, there is no other evaluation alternative other thantracing backward through all the other sections of Fig. 1,checking all of the phases of model use and development(going from the bottom to the top):

• misinterpretations of outputs,

• misspecifications of input scenarios,

• misuses of model,

• errors or compromises in calibration,

• assumptions in structural specification, and

• measurement errors or synthetic observations.

This sequence of assessment activities is now turnedaround, since the most systematic procedure for suchassessments is to trace the actual chronology of modelconstruction and use.

EVALUATING MODEL CONSTRUCTION

Ideally, model evaluation should be conducted inparallel with both model development and model appli-cation. In such cases, there are assessment opportunitiesthat are not otherwise available:

1. the model constructors can be questioned about de-cisions at a point in time closer to their originalthought processes,

2. additional documentations may be stimulated, and

3. feedback of evaluative ideas is more likely to affectthe construction of the model, both due to expedien-cies and to the lesser extent of model-builder egoidentification with the model in its earliest stages.

It is, however, possible to evaluate even mature, undocu-mented models by mimicking the procedure that mighthave taken place in the development of that model.

Model development is probably most accuratelycharacterized as a process that is an erratically progres-sing set of parallel and feedback modeling activities. It isa difficult and somewhat arbitrary task to define discretestages of model development. The flowchart in Fig. 2represents one attempt at a quite detailed outline ofmodel construction activities. Each of the developmentstages in this figure represents a group of similar activi-ties. The flowchart, in moving from top to bottom,moves from the ideal to the actual. Thus, each arrow ofthe flowchart represents an actualization of ideal ele-ments, an actualization which is manifested as compro-mises, approximations, and other kinds of points of cor-ruption in the model development process.

The question now arises as to what types of assess-ments are most appropriate at the various stages ofmodel development. Some of these assessments include

STAGES OF DEVELOPMENT(activities and

outcomes)

MOTIVATION

rJ

CONCEPTUALIZATION '

RESEARCH PLAN

EMPIRICAL RESEARCH

IMPLEMENTATION

ASSESSMENT

DEVELOPMENT PROCESS

IntendedApplications

uAnalytic ToolsAvailable

1StructuralDecisions:Choose Tools,Disciplines,Spatial, Temp.InformationalResolutions

=

Inputs,Variables,Outputs

LIMeasurementDecisions

iF P Tpattems | \J

AnalyticStructure:

Methodologies,Functional

Relationships,Behaviors

IE

S

c

CalibrationProcedures,HypothesisTesting

OperationalStructure

Collection ofhplrlcal Data,Generation ofynthetlc Data:Depends UponAvailable andollectable Data

~T1i

U(verification and)1 Validation |

Fig. 2. Steps in the process of model development, from idealto actual, and potential points (arrows) of corruption of theprocedure.

Page 153: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

148

global perspective, counter-analysis, data error estima-tion, counter-modeling, respecification, aggregation ordisaggregation, data splitting, measures of fit, ridge re-gression, sensitivity of fit, checks against new data, struc-tural error estimations, and comparisons with structuresof other models [1]. Each of these activities is. an at-tempt to probe for the existence and extent of corrup-tions that have taken place in the actualization of themodel. For example, consider the first step in the modeldevelopment process, the transfer from motivation toconceptualization. In drawing the free-body diagramaround the effects and variables that would be includedin the model, the builder has made assumptions aboutendogenous, exogenous, and untreated effects. These as-sumptions can conceptually be evaluated by examiningone more layer of detail around the free-body diagram,in other words, stepping back and taking a more univer-sal or global perspective. Incidentally, ideal pieces ofmodel documentation can also be identified as associ-ated with the decisions and actions taken at each stageof model development.

EVALUATING MODEL APPLICATION

The next logical focal points for evaluation activitiesare the potential points of corruption at the stages of themodel utilization process (Fig. 3). Here again, the stagesmove from ideal to actual and track the chronologicalhistory, this time of the application process. Some of the

STAGES OF APPLICATION UTILIZATION PROCESS(activities and outcomes)

MOTIVATION (Applications I

CONCEPTUALIZATION

FORMULATION

IMPLEMENTATION

INTERPRETATION

ASSESSMENT

StructuralChanges

InputChanges

StructuralChange

Implementation

InputScenario

Specification

I l—J Operation L J I'——S of Model *• !

I OutputInterpretation

Verification andValidation

Fig. 3 . Steps in the process of using a model, from ideal toactual, and potential points (arrows) of corruption in the utiliz-ation procedure.

gamut of assessment techniques that aim at the points ofcorruption of model use include collections of potentialpolicy actions and model uses that motivate the experi-ments, evaluations of measurement errors and input un-certainties, incorporations of structural uncertaintiesanalysis, propagations of input and structural uncertain-ties through the model, validations of interpretations,assessments of predictive quality, and evaluations ofthoroughness of validation procedures.

EVALUATING EXISTENCE AND MAGNITUDEOF CORRUPTIONS

The previous sections have suggested that an impor-tant way of identifying paths for model assessment is tomake examinations of single points in the procedure ofmodel activities. It is not difficult, however, to think ofexamples where such microscopic inspection may be ei-ther impossible or cost-ineffective. At several points inthe remainder of this paper, a strategic sequence will bediscussed that does not depend upon a series of singleinspections. This sequence includes these stages:

• specification,

• narrowed screening,

• focused investigation,

*> broadened perspective, and

• respecification.

Using these definitions, the assessments discussed in theprevious sections are all focused investigations.

What then is needed is some way to broaden thescope of modeling activities that can be assessed withone experiment. Of course, such broadened assessmentsare quite common, such as input sensitivities mappedinto output spaces. As in the identification of points ofcorruption, here also is a way of identifying all possibleevaluation activities, which involves a modularized de-scription of those evaluation techniques. The first com-ponent in such a description is the specification of thepoint at which an action is performed on a part of themodel or its process. The second component involves theselection of a particular type of action. Then there is amode of transferring this action to the point of examina-tion. And finally, there is a form for the examinationand a basis for comparison. Table 1 lists the componentsthat we have used in our own assessments or found de-scribed in the literature [2]. Most of the terms used inthis table have been discussed before, except structuralparameters which are those constants that are intendedto remain unchanged for a wide range of different modelapplications.

Page 154: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

149

Table 1. Evaluation techniques to determine the existence and extent of corruption*

Point ofaction

1. None

2. Observationdata

3. Calibrationprocedure

4. Input values

5. Inputvariables

6. Structuralparameters

7. Structuralfunctions

8. Structuralvariables (exo-geneous, oruntreated)

9. Outputvalues

10. Outputvariables

Type ofaction

1. No action, basecase or standardversion

2. Perturbed atpoint ofapplication

3. Systematicperturbation

4. Discrete or dis-tribution error/uncertainty esti-mations (ellipsoidalor other subspacespecification)

5. Additional fab-ricated data orstructural forms

6. Additionalmeasured data ormore sophisticatedstructural forms

7. Selective removalof certain (relevant,irrelevant, or cor-related) data orstructural forms

8. Linear or non-linear modelsimplifications

9. Aiming or optimiz-ing with respect tosome target set orperformance mea-sure

10. Weight values bya measure of cer-tainty or relevance

Transfermode

1. None

2. Re calibration

3. Single ormultiple runsof full model

4. Correlationor irrelevancetests

5. Linear ornonlinearsimplifiedmodel

6. Portion ofdecomposedmodel

7. Model withone or morenew components

8. Simplifiedor full modelinverse

Point ofexamination

1. Outputvariables

2. Outputvalues

3. Structuralvariables(endogenous,exogenous, oruntreated)

4. Structuralfunctions

5. Structuralparameters

6. Inputvariables

7. Inputvalues

8. Calibrationprocedure

9. Observationdata

Form forexamination

1. Directexaminations

2. Tabular ormatrixdisplays

3. Graphic ordistributionaldisplays

4. Additivedifferences

5. Gradients orpercentagedifferences

6. Comparativeratios or othermultiplica-tive differ-ences

7. Fit, confidence,or uncertaintydiscretemeasures

Basis forcomparison

1. Comparisonwith other em-pirical models

2. Comparisonwith theoreticor analyticmodels

3. Comparisonagainst handcalculations orreprogrammedcomponents

4. Data splittingby time, region,or type

5. New data withtime, experi-ments, orsimulations

6. Optimalityor stability

7. Comparisonwith under-standing, rea-sonableness,and accuracy

8. Examinationof appropriate-ness and detail

Page 155: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

150

One thing that is immediately clear about Table 1 isthat not all of the over 300,000 combinations representvaluable assessment techniques. It is clear that someguidance to the important entries in this opportunity setis necessary. (Note: ref. 1 lists about 60 important com-binations.) Using the module numbers of Table 1, someof these more important assessment techniques include:

• 433237, input/output sensitivity mappings [3];

• 443277 and 643277 simultaneously, propagation ofinput and structural uncertainties [4];

• 781527, describing function techniques [5];

• 445227, uncertainty propagation through describingfunctions [5];

• 252251, strength measures [ I ] ;

• 872517, ridge regression;

• 928757, contributive analysis; and

• 272544, structural parameter investigations with datasplitting.

Another thing that is immediately clear upon inspec-tion of Table 1 is that with the huge number of availableassessment techniques there must be some overall assess-ment strategy for the effective selection of appropriatetechniques.

GENERAL STRATEGIC FRAMEWORK

Because the assessment process is likely to be se-verely limited in resources such as time, funding, andexpertise, it is important to conduct a carefully orches-trated, cost-effective assessment strategy. In the assess-ment of the ICF Coal and Electric Utilities Model [3], anumber of assessment objectives and model characteris-tics led to the sequencing of seven categories of modelruns:

• equivalence runs,

• effects of verification corrections,

• new standard scenario,

• screening runs,

• individual issue runs,

• combined issue runs, and

• new standard scenario.

First the equivalence runs were used to establish that thein-house version of the model was comparable or identi-cal to the version of the model selected for assessment.After reprogramming of critical sections of the model,

effects of discovered programming errors were investi-gated. This investigation led to a new standard base-caseversion of the model, which was followed by a set ofscreening runs. Several groups of many model changeswere set up, each of the changes estimated to yield out-put perturbations that were roughly in a consistent di-rection and of reasonable magnitude relative to the otherchanges in its group. The counter-intuitive or interestingresults were successively identified as coming from smal-ler and smaller groups of changes, until the importantissues could be confronted with individual strategies. Fi-nally, the most important issues were combined to deter-mine nonlinear effects, and a new standard base case waseventually defined. This procedure is generalized in theflowchart shown in Fig. 4.

STRATEGIC CONSIDERATIONS

To make choices between alternative assessmentpaths, the objective or goals of the assessment must firstbe clearly understood (Table 2). The most obvious, andperhaps best defined, of these objectives is the validationof the input data and structural form of the model rela-tive to specific applications. It would probably be moreuseful to make statements about the appropriateness ofa model for contributing information to future policydecisions in generic application areas. Two objectivesthat would be difficult to achieve simultaneously wouldbe (1) suggest improvements and (2) establish credibil-ity. The first of these suggests a series of model versions,while the second suggests a single model version estab-lished at the beginning of the assessment.

As mentioned in the previous section, model charac-teristics also play a key role in the selection of a cleversystematic assessment strategy (Table 2). The ICF modelmentioned previously was very expensive to run, thusrequiring as an important strategic factor a minimum useof the full model. For models in development, or forimmature models, conceptual overviews may be suffi-cient for reaching major conclusions.

Finally, based upon considerations of assessment ob-jectives and model characteristics, there are a number ofalternative settings, depths, and scopes that can be cho-sen to fit those considerations (Table 3). Some of thesealternatives result from the possibility of different asses-sor identities. In addition, the assessors could addresseither a single model or several comparable models.There could be very different expectations from the as-sessment process depending upon this choice of setting.For instance, the model builder could obviously providea very cost-effective assessment, but credibility would bedifficult to establish under such circumstances.

Page 156: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Documentation andCode Procurementand Examination

EquivalenceTesting

(New) Standard

V

0) (I)at 3u oup

V

151

Concensus

Full Model

Testing-group screening-subgroup screening-individual issues

ComponentTesting

-group screening-subgroup screening-individual issues-possible reprog-

r attuning orrespecification

SubcomponentTesting

-group screening-subgroup screening-individual issues-possible reprog-

ranming orrespecification

Keprograaning of

Components or

Full Model

Runs withVerificationCorrections

Interaction withModel Builders,Sponsors, Users

A

A

Fig. 4. General strategy for independent evaluation - activities preceding and including model runs.

Table 2. Important considerations in developing acost-effective strategy

Assessment objectives

Validate specific past applicationsValidate generic future applicationsSuggest model improvementsCreate resource group with model expertiseEstablish credibility among usersTest model transferabilityl-'urther art of model assessment

Model characteristics

Perceived importance of model, half-life of its usefulnessModel size, complexity, and types of analytic methodologiesCost and time per runOperational transferability, including extent and availability of

documentationRange of intended applicationsPrevious model use and assessmentsStage of model development and maturityMaturity of subcomponents and assessment history of those

componentsUse of existing data bases and the assessment history of those

data

Page 157: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

152

Table 3. Organizational strategy options

Setting

One or several model buildersOne or several model sponsors or usersOne or several independent assessorsOne or several comparable models

Depth

Literature reviewOverview assessmentAuditIn-depth assessment

Scope of evaluation

DocumentationData or input validityStructural validityPredictive validityVerification of implementationOperational characteristics

Scope of applications

Evaluate specific applicationsGeneral evaluation

For each possible assessment setting, there are fourpotential depths:

1. literature review: survey and critical analysis of thepublished literature on the structure, implementa-tion, and applications of one or more models;

2. overview: literature review plus analysis of computercodes and other unpublished materials;

3. independent audit: overview plus the conduct ofmodel exercises designed by the assessor and exe-cuted by the modeler with the assessor "looking overthe shoulder"; and

4. in-depth: independent detailed assessment of modelformulation, structure, and implementation with theassessor in control of the model and associated database.

The most cost-effective of these depths will dependupon a number of model characteristics, most particu-larly, model maturity.

As has been pointed out in several recent papers, animportant way to limit the assessment process is to limitits scope (see Table 3). First, decisions must be madeconcerning the version(s) of the model to be assessedand the types of model applications at which the assess-ment process is to be aimed. Different aspects of themodel that can be evaluated in an assessment includedocumentation, validity, verification, or operationalcharacteristics (Table 3).

The development and choice of the most cost-effective validation strategies is virtually a new field ofresearch. It seems as though systematic techniques andprocedures may well be the most appropriate. Evengiven good systematic validation techniques, there are anumber of reasons why model assessments are difficultto conduct:

1. assessments can be as time consuming as the wholemodel-building process,

2. model sponsors generally do not appreciate the addi-tional funding requirements, and

3. decision makers who use models do not insist onquality measures.

In addition, such assessment methodologies would likelybe both model- and application-specific, and probablymust await the development of more systematic model-ing formalisms.

ACKNOWLEDGMENTS

This methodological and conceptual research has re-sulted from energy model analysis conducted at the MITEnergy Lab with funding provided primarily by the Elec-tric Power Research Institute (EPRI), Richard Richels,program monitor. Additional sponsors have included theU.S. Department of Energy (DOE), Energy InformationAdministration, Office of Applied Analysis, GeorgeLady, program monitor, as well as other DOE and U.S.Environmental Protection Agency programs. DavidWood is program director of the MIT Energy ModelAnalysis Program and, along with Edwin Kuh of MIT,has been principal investigator of the EPRI projects.Other major participants in this research include, amongothers, Fred Schweppe and Neil Goldman of MIT.

Page 158: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

153

REFERENCES

1. J. Gruhl, "Model Validation," pp. 536-41 in Proceedings 1979 Cybernetics and Society, IEEE Systems,79CH1424-1SMC, October 1979.

2. J. Gruhl and N. Gruhl, Methods and Examples of Model Validation - A Partially Annotated Bibliography, MITEnergy Model Assessment Program, MIT-EL 78-002WP, Cambridge, Mass., 1978.

3. N. Goldman and J. Gruhl, "Assessing the ICF Coal and Electric Utilities Model," in Proceedings ofWorkshop onValidation and Assessment of Energy Models, National Bureau of Standards, Gaithersburg, Md., January 1979.

4. J. Gruhl, Alternative Electric Generation Impact Simulator - AEGIS, Description and Examples, MIT EnergyLab Report MIT-EL 79-011, Cambridge, Mass., November 1978.

5. F. C. Schweppe and J. Gruhl, "Systematic Sensitivity Analysis Using Describing Functions," in Proceedings ofWorkshop on Validation and Assessment of Energy Models, National Bureau of Standards, Gaithersburg, Md.,January 1979.

Page 159: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Methodological Issues in Analyzing Automotive TransportationPolicy Models: Two Case Studies

Barbara C. RichardsonPolicy Analysis Division

Highway Safety Research InstituteThe University of Michigan

Ann Arbor, Michigan

ABSTRACT

A framework for the analysis of econometric mode's includes model structure analysis, equationreconstruction, sensitivity analysis, forecasting behavior analysis, and model comparison. Issues raisedand problems encountered in analyzing two models, the Wharton EFA Automobile Demand Modeland the Jack Faucett Associates Automobile Sector Forecasting Model, are presented. These issues andproblems deal primarily with the documentation, data, computer programs, versions, and structures ofthe models.

INTRODUCTION

The use of large-scale mathematical models has in-creased significantly in recent years, with many of thesemodels having been funded by the federal government.A National Science Foundation study found that be-tween 1966 and 1973 federal agencies other than theDepartment of Defense supported or used more than650 models developed at a cost estimated at $100 mil-lion [1]. Since 1973, many more models have been de-veloped and used. Unfortunately, models are often usedbefore they are analyzed. A description of the multipleuses and misuses of one large-scale automobile demandmodel performed prior to model analysis is presented bySaalberg, Richardson, and Joscelyn [2]. Fortunately,however, "There is a growing recognition of the need toassess large-scale models" [3, p. 2] .

The purpose of this paper is to present the methods ofanalysis used in evaluating two econometric automobiledemand forecasting models. It is addressed to modelevaluators and hopefully conveys to them a workablemethod for analyzing econometric models and an inklingof the range of problems that they may encounter inperforming such analyses; to model builders so that theywill know what problems model users, model evaluators,

and policy analysts have in using models; and to policyanalysts in the hope that they will exercise care in theirselection and use of models.

The general framework of analysis that was developedprior to review of the models will be presented first. It isbased in part on an analysis framework discussed byDhrymes et al. [4]. Next are discussions of the issuesand problems encountered in performing the analyses.

The two models that were analyzed are the WhartonEFA Automobile Demand Model [5] and the JackFaucett Associates Automobile Sector ForecastingModel [6-9] . Detailed descriptions of results of theanalyses of these two models are provided, respectively,by Golomb et al. [10] and Richardson et al. [11].

The Wharton EFA Automobile Demand Model is aneconometric stock-adjustment model designed to fore-cast long-term equilibrium levels of the size and compo-sition of U.S. automobile demand and stock, given pro-jected vehicle characteristics and general economic anddemographic conditions, so that the economic and mar-ket impacts of alternative policies can be studied quanti-tatively. The model contains over 80 statistically derivedequations, 300 identities, and 590 variables.

The Jack Faucett Associates Automobile Sector Fore-casting Model is a long-term automobile sector forecast-

154

Page 160: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

155

ing model comprising a stock-adjustment econometricauto demand sector and a computational policy blockthat represents the supply side of the market. It is smallcompared with the Wharton EFA auto model and is de-signed to forecast the effects of such policies as fueleconomy standards, gasoline taxes, excise taxes, and re-bates. The model estimates the effects of these policieson gasoline consumption, vehicle miles of travel, new carprices and sales, the number of cars in use (by size andage), and fuel economy.

All models have certain strengths and weaknesses andmay be appropriate for one use and not for another.Therefore, models should be evaluated with respect tospecific purposes and objectives. This philosophy is welldescribed as follows:

In the current state of our knowledge and analyticalneeds, to concentrate our attention solely on proving ordisproving the "truth" of an econometric model is tochoose an activity virtually guaranteed to suppress themajor benefits which can flow from the proper use ofeconometric models. Having constructed the best modelsof which we are capable, we ought to concern ourselvesdirectly with whether or not particular models can beconsidered to be reliable tools for particular uses, regard-less of the strict faithfulness of their specification.

In this context, "validation" becomes a problem-dependent or decision-dependent process, differing fromcase to case as the proposed use of the model under con-sideration changes. Thus, a particular model may be vali-dated for one purpose and not for another. In each casethe process of validation is designed to answer the ques-tion: Is the model fulfilling the stated purpose? We canthen speak of the evaluation of these models as the proc-ess of attempting to validate them for a series of purposes[41-

This philosophy has guided our analysis to date.

ANALYSIS FRAMEWORK

The general model analysis framework presented herewas developed as a tentative plan to be followed in per-forming analyses of econometric models (primarily time-series) relating to the motor vehicle transportationsystem. It was developed before analysis of any modelsbegan and was viewed as a dynamic framework, change-able as needed, depending on specific characteristics ofthe models being reviewed.

The five primary tasks included in the model analysisframework are

• model structure analysis,

• model reconstruction,

• submodel evaluation,

• full model evaluation, and

• model comparison.

Model Structure Analysis

The general purpose of model structure analysis is toexamine the structure and theory of the model so thatthe model and its output can be more easily understood.

This task has two aspects. The first involves examiningthe theory behind specific equations or sets of equationsin the model. The theory used in the model must bechecked for consistency throughout the model and withobservations about reality. An example of a set of equa-tions is the stock-adjustment process commonly used inautomobile demand models. A knowledge of the theo-retical work extant in the field of the model is requiredin order to perform this task.

The second aspect of model structure analysis involvesexamining the overall logic of the entire model. Thisincludes looking at how equations and sets of equationsrelate to one another and at the assumptions made im-plicit by model design. Examination of the logic of themodel is facilitated by drawing flow charts of the inter-actions within the model. Information obtained in per-forming model structure analysis provides useful insightsin other steps of the model analysis, including submodeland full-model evaluation and model comparison.

Model Reconstruction

The purpose of model reconstruction is twofold. First,by reconstructing the model on an equation-by-equationbasis, the accuracy of the results reported in the modeldocumentation can be verified. Any significant differ-ences should be identified to ascertain whether themodel being analyzed is the same model as that reportedby the model authors. Second, this effort provides statis-tical information concerning the quality of the modelthat is generated in the course of building the model andthat can be used to evaluate the model.

Equation reconstruction includes replication of theequations of the model on an equation-by-equation basisusing a standard regression package, checking the signsof the estimated parameters against the predictions ofthe theory, and testing the statistical significance of theparameter estimates and the overall goodness of fit ofthe equation as reflected in the coefficient of determina-tion (R2). Also included is testing for the first-orderautocorrelation of the residuals by using the Durbin-Watson statistic or the appropriate test for equationswith lagged endogenous variables.

Other tasks included in equation reconstruction arestudying the stability of the estimated equations, exam-ining the equation estimation techniques, and analyzingthe sensitivity of the parameter estimates. The stabilityof the estimated equations is studied by examining theestimated parameters and summary statistics of the

Page 161: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

156

equation over different sample periods. Estimation tech-niques can be reviewed in several frameworks. First, ap-propriate estimation procedures on a single-equationbasis can be examined. Second, appropriate estimationtechniques for individual equations or groups of equa-tions when viewed as part of an entire system of equa-tions can be discussed. Finally, system methods of esti-mation can be examined.

When appropriate, the sensitivity of the parameter esti-mates to differing methods of estimation can be ana-lyzed. This can be done by reestimating the equationusing alternative single-equation techniques such as in-strumental variables or two-stage least squares, or systemtechniques such as three-stage least squares or full infor-mation maximum likelihood, and by comparing parame-ter estimates.

Submodel Evaluation

Models have certain equations that "drive" the fullmodel (i.e., they strongly influence the dynamics of theentire system of equations). Conceptually, a submodelmay consist of either a single equation or a group ofequations. Analysis of submodels permits detailedstudies of selected equations of the model and can pro-vide valuable insight into understanding the sensitivityand forecasting behavior of the entire system of equa-tions. For example, it aids in pinpointing the sources ofunexpected output, cyclical behavior, or theoretically in-correct changes in output variables. Components of sub-model evaluation are forecasting behavior analysis andsensitivity analysis.

The forecasting behavior of an econometric time-seriesmodel can be studied by running three basic types ofsimulations that can be used to generate forecasts. Theseare test, static, and dynamic simulations.

The test simulation tests if the one-period forecasts ofthe model equal historical values given actual values ofcurrent and lagged endogenous right-hand variables inthe model. It assigns historical values to all independentvariables and then determines the predicted values of thedependent variables. It is the least complex type of simu-lation for two reasons. First, predicted values are notused for the current endogenous variables that appear asright-hand variables in equations. Second, actual valuesfor all lagged endogenous variables that appear as inde-pendent variables in equations are used.

The test simulation has two primary purposes. First, itcan be used to determine if the sample data, model, andequation information are accurately coded in the simula-tion program. Second, it can be used to analyze the basicstructure of the equations of the model since actual val-ues are used for all right-hand variables in generating

predictions for the left-hand variables within the sampleperiod.

The static simulation produces a series of consecutivesingle-period forecasts. The model is initialized beforeeach forecast so that actual values of the lagged endoge-nous variables are used throughout and so that forecast-ing errors are not compounded. The static simulation isof intermediate complexity. Predicted values are usedfor current endogenous variables that appear as inde-pendent variables in equations. However, as noted, ac-tual values are used for all lagged endogenous v.iiables.The static simulation is essentially a one-peri jd-aheadforecast.

The basic purpose of the static simulation is to test theaccuracy of the model in generating one-period- -"headforecasts. The results of a static simulation t s aconvenient reference point to judge the results 01 dy-namic simulations.

The dynamic simulation produces a multiperiod fore-cast. The model is initialized before the first forecastperiod and then "operates on its own" in generatingforecasts for all dependent variables. It is the most com-plex and realistic type of simulation. The simulation usespredicted values for both current endogenous variablesand lagged endogenous variables in generating predic-tions for the dependent variables for each time period ofthe forecasting period.

The basic purpose of the dynamic simulation is to testthe accuracy of the model in generating multiperiodforecasts. This is the most demanding type of simulationfor a model. It provides a stern test of the model'scapability to simulate the system being modeled. Thissimulation may be divided into three major categories:within-sample, postsample, and future period. Within-sample refers to data used to build and estimate themodel. Postsample refers to data outside the sample pe-riod of the model but within the realm of historicalexperience.

For both within-sample and postsample simulations,the following indicators of forecasting accuracy can bestudied: error statistics, Calcomp (continuous) plots, andtables of actual, predicted, and residual (i.e., actual mi-nus predicted) values for all endogenous variables. Theseindicators can be generated for each of the three typesof simulations discussed (test, static, and dynamic).

Error statistics provide a summary measure of the fore-casting accuracy of the model. The following error statis-tics, as well as others, can be considered for each endoge-nous variable of the model:

• The mean absolute percentage error is equal to theaverage of the absolute percentage errors. It is used as

Page 162: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

157

a simple summary statistic to measure forecasting ac-curacy.

• The root mean squared error is equal to the squareroot of the average of the squared forecasting errors.It is used frequently as a summary statistic to measureforecasting accuracy.

• The root mean squared error as a percentage of themean is equal to the root mean squared error of thevariable divided by its mean. It is an indicator of therelative importance of the magnitude of the rootmean squared error.

Calcomp plots are computer-generated, continuoustime plot graphs. Time is plotted on the horizontal axisof the graph while values of the endogenous variable areon the vertical axis. The plots provide a clear picture ofthe model's capability to track the historical record, in-cluding the levels as well as the turning points of thepredicted variables. Models are often accurate at predict-ing levels, but inaccurate at predicting turning points ofthe output variables. However, they may also displayaccuracy in forecasting turns, but inaccuracy in forecast-ing levels. This is particularly true of models that aredominated by autoregressive behavior and was true ofone of the models discussed in this paper.

Finally, tables of actual, predicted, and residual valuesallow for a period-by-period analysis of the forecastingbehavior of the model. The magnitude and direction offorecasting errors can be carefully studied.

The tendency of the model to accumulate forecastingerrors as the forecasting horizon lengthens can bestudied by using the following procedure:

1. A sample period is selected for the simulation analy-sis. For purposes of illustration, consider a sampleperiod of 1960-1975 for an annual model.

2. A static simulation is run. Recall that this is a one-year ahead forecast in which the model is reinitializedeach year so that actual values of lagged endogenousvariables are always used. Within this framework, themodel is not permitted to compound or correct itsown forecasting errors.

3. Two-year forecasts are obtained by permitting themodel to generate its own lagged endogenous varia-bles for the second year of the forecast. The modelfirst forecasts 1960 and 1961 using its own forecastof 1960 as lagged "data" for the forecast of 1961.The model is then reinitialized as of 1960 and fore-casts 1961 and 1962. To forecast 1963 values themodel is reinitialized with 1961 values. The new1962 forecasts are produced and used to generate the1963 predictions. The process is repeated through

1974-1975. In this fashion a time series of forecastsfor 1961-1975 is obtained with each being a two-year-ahead forecast.

4. An analogous procedure can be carried out for three-,four-, six-, and eight-year-ahead forecasts. A full pe-riod (1960-1975) dynamic simulation can also berun.

5. Root mean squared errors are calculated for eachforecast horizon length.

6. The root mean squared errors for an endogenous vari-able are then compared to see if they increase signifi-cantly as the length of the forecast horizon increases.If they do increase significantly, the model displays atendency to accumulate forecasting errors as the fore-casting horizon lengthens.

The sensitivity of the submodel can be analyzed inresponse to specified changes in the independent varia-bles and estimated parameters. The output of the modelin the case of the changed independent variables is com-pared to the output of the model given some base-casevalues for the independent variables. The base-case datamay be historical data. The differences in the values ofthe output for each time unit in the comparison periodrepresent the impact on the dependent variable ofchanges in the independent variables. This type ofanalysis is referred to as sensitivity analysis, multiplieranalysis, or impact analysis. Values of the estimated pa-rameters may also be varied, and the impact on the de-pendent variable measured.

The results of the sensitivity analysis of the model canbe displayed graphically or can be listed in tabular formto facilitate more precise study on a period-by-periodbasis.

Full-Model Evaluation

The purpose of full-model evaluation is to analyze thesensitivity and forecasting behavior of the model as awhole. The full-model evaluation results can be com-pared with the submodel results to determine the impor-tance of the interdependencies in the full-model struc-ture. The sensitivity and forecasting behavior of theentire model can be analyzed using the same proceduresas those discussed for submodel evaluation.

Model Comparison

The basic objective of model comparison is to com-pare the sensitivity and forecasting behavior of similarmodels. In order to do this, two conditions must hold:

1. a consistent set of assumptions for changes in theexogenous variables must be used, and

Page 163: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

158

2, the results must be generated over identical time peri-ods to minimize the impact of nonlinearities in themodels.

Other questions may be asked: What are the particularuses of the models? What are their strengths and weak-nesses? Are they good for forecasting or policy analysisor both?

Summary

This framework can be characterized as having fivesteps: model structure analysis, model reconstruction,submodel evaluation, full-model evaluation, and modelcomparison.

METHODOLOGICAL ISSUES

For the most part, the analysis framework outlinedabove was followed in the evaluation of the WhartonEFA and Faucett auto models. Deviations from this pro-cedure are presented briefly here.

In the model structure analysis of the Wharton EFAauto model, it was found that a portion of the theoreti-cal basis of the model was traditional, yet part of itseemed new to the model evaluators who were familiarwith the work done in the field. This necessitated, inaddition to extensive literature searching, attempting tounderstand the basis for and impacts of the new theoret-ical aspects of the model. One example of this in theWharton EFA auto model is the estimation of the per-centage of new car sales by size class using a stock-adjustment process. This had not previously been donein other auto demand models, and it was considerednecessary to trace its impact through the model. Thiswas accomplished primarily by examining the forecast-ing behavior and sensitivity of the model in the sub-model evaluation task.

In both the submodel and full-model analyses of theWharton EFA auto model, and in the full model analysisof the Faucett model, in addition to the statistics pre-sented above, the simulated R2 (SIML R-SQ) statisticwas generated and used in evaluating the models. SIMLR-SQ is a measure of the predictive accuracy of theequation as solved in the model simulation.

The formula for SIML R-SQ is

SIML R-SQ = 1

2 (Si - h()2

i=a

2 (*,- - h)2

where

a = starting year of simulation,

b - ending year of simulation,

S- = simulated value for year /,

hj = historical (actual) value for year /,

h = average historical value.

The formula is similar to the one used to i alculate Rfor a single equation model. The only difference be-tween the SIML R-SQ and the single equation R is inthe computation of the endogenous value, which in thiscase is the simulation value, Sr In the single equationmodel, the endogenous variable is computed using theactual values of the independent variables for each year.In the case of SIML R-SQ, the endogenous variable,5/, iscomputed from values generated by the model as well asfrom actual values.

Over the forecasting interval (a to b), the SIML R-SQstatistic expresses the variation between the historicaland simulated values as a fraction of the total variationin the historical values. Since the total variation of thesimulated values from the actual values may exceed thetotal variation of the actual values from their mean, theSIML R-SQ, unlike the R2, may have a negative value.That is, S (S;. - ftf.)

2 may be greater than £ (ft,- - ft)2.The SIML R-SQ has the same interpretation for the

multi-equation model as the R2 has for the single-equation model over the positive range. That is, the posi-tive SIML R-SQ values indicate the proportion of thevariation of the endogenous variable that can be attrib-uted to the predictive capacity of the model. Negativevalues can be taken to indicate the model's poor abilityto predict. These negative values may also indicatechanges in the structure of the modeled system, or insta-bility of the estimated coefficients. A unit value of thesimulated R2 indicates that the simulation experimentgenerated exact values of the historical data.

During the full-model evaluation, the Wharton EFAmodel output was compared to that of a naive model(sample mean) for several variables. This comparisonraised interesting questions concerning the usefulness ofbuilding large-scale econometric models because, in someinstances, the naive model outperformed the WhartonEFA model, on a root mean squared error basis. Thesample periods of both the Wharton EFA and theFaucett models were split into two periods. This permit-ted a careful analysis of the effects of different eco-nomic and demographic data that applied in the differ-ent time periods.

Page 164: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

159

The usefulness of the models for different purposes,broadly labeled as forecasting and policy analysis, wasevaluated. Specifically, the accuracy of the forecasts ofthe models for policy-related variables was studied. Thistreatment of the analyses addresses the points quotedearlier [4].

Any effort is constrained by the availability of timeand resources. Decisions were made before and duringthe analyses to restrict certain aspects of our inquiries.Judgments were made to select the analysis tasks thatwould yield the best information within the existingconstraints. Necessarily, this required that some stepsoutlined in the analysis framework not be performed. Inthe case of the Wharton EFA auto model, these stepsincluded producing ex ante forecasts, studying the ten-dency of the submodels to accumulate errors, and run-ning test and static simulations of the submodels.

Submodel evaluation of the Faucett model did notoccur because the model was too small to divide intomeaningful submodels. Note, however, that the model iscomposed of two main blocks, a policy block and ademand block. These were analyzed separately. Ratherthan perform what was referred to above as submodeland full-model evaluation, the forecasting behavior andsensitivity of the econometric portion of the model werestudied. The forecasting behavior of the entire modelwas not studied because certain portions of the nonecon-ometric (policy) block of the model (e.g., the projectedfuel economy-cost relationships) are not applicable overthe model's sample period.

The major problems and issues encountered in per-forming the analyses are presented below. They dealwith documentation, data, computer programs, versionsof the model, and the model structure.

Model Documentation

Practically every aspect of the analysis oi both modelswas affected by inadequate model documentation. Inexamining the structure of the models it was found thatthe Wharton EFA auto model documentation was in-complete in that computations and logical connectionswithin the model were not thoroughly explained, andthus the logic and structure had to be gleaned from acareful examination of the computer program of themodel. Similar problems arose in examining the Faucettauto model but were exacerbated by inconsistencies be-tween the documentation and computer program.

The overwhelming problem in attempting to recon-struct the equations of the Wharton EFA auto modelwas inadequate documentation of the historical datatape. In fact, it was impossible to even attempt to recon-struct any of the cross-section equations of the model.

Additional documentation should have been developedbecause the labels of the variables on the tape did notmatch those of the variables reported in the existingdocumentation. Thus, given that there are hundreds ofvariables in the model, it was difficult, at best, to matchvariables on the data tape with those needed to run theregressions.

The second documentation problem dealt with the in-adequate description of the techniques used in equationestimation. Searching the text of the model documenta-tion and experimenting with different estimation tech-niques and sample periods failed to produce parameterestimates for several equations that agreed with those inthe model. When it proved impossible to obtain parame-ter estimates that matched those reported in the modeldocumentation, the model authors were contacted in anattempt to obtain the necessary information.

A third problem was that the use of updated time-series data was not documented by the model authors.This, coupled with an apparent time difference betweenthe deliveries of the model report and model computerprogram tape, resulted in an inability to reproduce sev-eral equations. Undoubtedly, this is not a problem par-ticular to this model. Data series are constantly beingrevised, and model builders are inclined to use the latestdata available. However, ensuring consistency among allparts of the model and its documentation is crucial.

In the equation reconstruction task of the Faucettmodel analysis, only the equations of the econometricauto demand portion of the model were reconstructed.The data used were those supplied by the model authors.When the coefficients of some of the reconstructedequations did not match those of the original model, itwas necessary to obtain additional information concern-ing sample period and data sources from the model au-thors. With the additional information, it was possible toreconstruct most of the econometric equations of themodel.

Computer Program

Our main problem with the computer program of theWharton EFA auto model was that it was incompatiblewith the computer facilities at The University of Michi-gan. The computer tape was prepared for running on aDEC 10 machine, whereas The University of Michiganhas an Amdahl computer (IBM compatible). Considera-ble effort was required to implement the model at TheUniversity of Michigan.

The program of the Faucett auto model received fromthe model sponsors had apparently never been test runbecause it contained errors in the code that precluded itsoperation. Once these errors were identified and correc-

Page 165: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

160

ted it was necessary to make several changes in themodel to ensure consistency and to permit the analysisto occur. This point is discussed further in the followingsections. One example of a programming error in thismodel was the use of beginning of the year auto stock inthe estimation of the vehicle miles of travel equation andend of the year auto stock in the computer program ofthe model.

Also in the Faucett auto model, it was very difficult toperform sensitivity tests dealing with one key variable(autos owned per household). This variable was used asinput to the computer program rather than being anintegral part of it.

Data

While data problems are mentioned in most other sec-tions of this paper, it is useful to call special attention tothem. Data problems in model analysis are ubiquitous.The sources of data used in building models are usuallyclearly stated, but the data actually used are often trans-formations of the originally published data. Verificationof the data used in building the model is difficult with-out knowing their sources and the exact procedures usedin the transformations.

Changes occur in time-series data as new informationbecomes available and model builders update theirmodels with the latest data. These changes present aproblem if consistency is not maintained throughout themodel or if the use of the new series is not documented.

A major problem in attempting to reconstruct someequations of the Wharton EFA auto model was the ex-clusion of a whole set of data (automotive characteris-tics) from the model data tape. This data set was notdelivered to the model sponsor and thus was unavailablefor review. This raises the problem of the unavailabilityof proprietary data.

Not all aspects of potential data problems were ad-dressed in these analyses. For example, the choice ofdata series, the exclusion of key data, and the qualityand timeliness of those selected are among the issuesthat should be addressed in performing a model analysis.

Versions of the Model

A classic problem is that of the "mo'ing target." Mod-els frequently and understandably undergo revisions. Un-fortunately, documentation does not keep pace. It isoften the case that a model is being revised while theanalysis of an^earlier version is ongoing. It is sometimesdifficult to even determine which version is being ana-lyzed. A question that presents itself in this context iswhether it is worth performing the evaluation on anearly version if a later version addresses the weaknesses

of the first. The answer usually should be affirmativebecause it is probably true that later versions will nothave corrected some of the basic theoretical and struc-tural problems of the earlier version.

A related problem arises from the changes made to themodel by the model evaluators. Analysis required thealteration of the Faucett auto model. Questions that areraised, then, are how much effort should be spent revis-ing the model, and, if it is changed, is it then the samemodel? Answers to these are dependent on the individ-ual model analysis situation and the judgment of themodel evaluators.

Model Structure

Several issues that arise as a model analysis is per-formed are related to phenomena that are not inherentin the model, for example, those dealing with data anddocumentation. The structure of the model, however,may also be a source of problems. This was the casewhen it became inappropriate to perform submodel eval-uation on the econometric portion of the Faucett automodel. That part of the model is simply too small todivide into submodels, the analysis of which would prob-ably not yield substantial information about the entiremodel.

One situation particular to the Faucett model was thefact that the model was built to examine the impacts offuture policies and was not designed to simulate the his-torical period. This precluded performing historical anal-yses with the model as it had been built. It was thereforenecessary to initialize the model by building into it his-torical data on various characteristics of the automotivemarket and other economic and demographic variables.Some of these variables required aggregation or disaggre-gation to be compatible with the other variables in themodel.

Only the econometric portion of the Faucett modelwas included in the historical simulations that were run.Therefore, output variables from the policy part had tobe set exogenously as input to the econometric autodemand portion for simulation experiments over thesample period. Values of these variables (i.e., fuel econ-omy of vehicles and auto prices) had to be estimated,based on the best information available.

Other problems arose in performing historical simula-tions because of differences in the data used by modelauthors in estimating equations and in the computer pro-gram for forecasting. A great deal of effort had to beexpended to reconcile these differences so that the pro-gram would run and produce reasonable results. An in-teresting aspect of this procedure, however, was that assome parts of the program were improved, the output

Page 166: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

from other parts worsened. The result of this is tha.some of the simulation experiments were run using theoriginal version of the model, and some were run using arevised version. While this may not be optimal, we be-lieve it to have been the best approach.

SUMMARY

A model evaluation framework was designed and im-plemented in the analyses of two econometric automo-bile demand models. The major tasks included modelstructure analysis, equation reconstruction, forecastingbehavior analysis, and sensitivity analysis. Model com-parison was suggested but has not yet been performed.For the most part, standard statistical measures wereused in evaluating the models' performances. Majorproblems that arose in the analyses stemmed more frominadequate documentation, simple errors in the modeldocumentaion and computer program, the unavailabilityof historical data, and an inability of model authoringorganizations to provide accurate and complete informa-tion on procedures due mainly to staff turnover or a lackof recollection. Variants in analysis technique were re-quired due to these problems and also because of charac-teristics inherent in the models. For example, ex postforecasts (run over the sample period) could not be donewith one model before we modified it.

There are several inadequacies in this approach thatwill be addressed in future analyses of models. The ma-jor one is that no comprehensive effort was designed toexamine and verify the data used in building and runningthe model. A limited attempt at this was made in theanalysis of the Faucett model, but stumbling blocks inthe form of transformed data presented tremendous dif-ficulty.

Maintained hypotheses were examined throughoutthe analyses as they became apparent. A more system-atic approach to looking at major maintained hypotheseswould be beneficial. This would ensure that importantones were not overlooked in the analysis. In addition,objectives of the model should be set forth and the mod-el's success in meeting them indicated.

One area that needs attention is the establishment ofcriteria against which models can be measured. Stand-

ards for judging the quality of sample period error statis-tics for automotive transportation policy models are un-available. Therefore, while factual statements can bemade concerning a model's performance, conclusions aredifficult to draw. Part of the establishment of criteriahas to be a review of the field and the output of similarmodels. But that would only be a first step in this proc-ess. Decisions on what is a good model need to be made.

The analyses referred to here are subject to the criti-cism that they are incomplete: that not all maintainedhypotheses were addressed, that only a relatively fewforecasting and sensitivity tests were performed, andthat a small number of submodels were reviewed. Suchincompleteness is unfortunate, but nevertheless neces-sary, given constraints of time and money. Tradeoffs inselecting which parts of an analysis to perform mustalways be made.

Our experience led us to expect unforeseen obstaclesto arise in the course of the model analyses. Any re-search effort must be dynamic to adjust to the unfore-seen. A lesson that we have learned is that analyses ofthis type encounter more than the usual quota of obsta-cles. The most likely consequence is delay because theobstacles frequently are associated with obtaining neces-sary information (e.g., data, model documentation, clari-fications from model authors, etc.). Delay can increasecosts in addition to extending schedules. We believe itimportant for sponsors and analysts to recognize thelikelihood of delays in planning work. Model analysis is acomplex activity that cannot be done well under tighttime schedules.

ACKNOWLEDGMENTS

Development of this independent paper was supportedby an unrestricted grant from the Motor Vehicle Manu-facturers Association. My colleagues, Michael M. Luckeyand D. Henry Golomb, collaborated in developing andimplementing the analysis framework. Other colleagueswho have provided assistance and advice over the courseof the analysis study are Daniel H. Hill, Saul H. Hymans,Stephen M. Pollock, Kent B. Joscelyn, Gerald L.Musgrave, David C. Roberts, James H. Saalberg, PrakashB. Sanghvi, Lawrence D. Segel, and Daniel B. Suits.Their contributions are gratefully acknowledged.

REFERENCES

1. G. Fromm, W. 0 . Hamilton, and D. E. Hamilton, Federally Supported Mathematical Models: Survey andAnalysis, U.S. Government Printing Office, Washington, D.C., 1974.

2. J. H. Saalberg, B.C. Richardson, and K. B. Joscelyn, Federal Policy Applications of the Wharton EFA Automo-bile Demand Model, UMI Research Press, Ann Arbor, Mich., 1979.

3. U.S. General Accounting Office, Guidelines for Model Evaluation, Washington, D.C., exposure draft, 1979.

Page 167: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

162

4. P. J. Dhrymes, E. P. Howrey, S. H. Hymans, J. Kmenta, E. E. Learner, R. E. Quandt, J. B. Ramsey, H. T.Shapiro, and V. Zarnowitz, "Criteria for Evaluation of Econometric Models," Annals Econ. Social Meets. 1(3):291-324.

5. G. R. Schink and C. J. Loxley, An Analysis of the Automobile Market: Modeling the Long-run Determinants ofthe Demand for Automobiles. Volume I: The Wharton EFA Automobile Demand Model. Volume II: SimulationAnalysis Using the Wharton EFA Automobile Demand Model. Volume III: Appendices to the Wharton EFAAutomobile Demand Model, Transportation Systems Center Final Report DOT-TSC-1072,1977.

6. C. Difiglio, and D. Kulash, Marketing and Mobility, The Panel on Marketing and Mobility, Federal EnergyAdministration [Report of a Panel of the Interagency Task Force on Motor Vehicle Goals Beyond 1980] ,1976.

7. C. Difiglio and D. Kuiash, Methodology and Analysis ofWays ofIncreasing the Effectiveness ofthe Use of FuelEnergy Resources: Increasing Automobile and Fuel Economy via Government Policy, Department of Energy,Congressional Budget Office, presented to the U.S.-U.S.S.R. Joint Energy Committee: Information and Fore-casting, 1977.

8. Jack Faucett Associates, Inc., Automobile Sector Forecasting Model User's Guide. Final Report, Jack FaucettAssociates, Inc., Report No. JACKFAU-76-137-6, 1976.

9. Jack Faucett Associates, Inc., Automobile Sector Forecasting Model Documentation. Final Report. Submittedto the Federal Energy Administration. Jack Faucett Associates, Inc., Report No. JACKFAU-76-137-7,1976.

10. D. H. Golomb, M. M. Luckey, J. H. Saalberg, B. C. Richardson, and K.B. Joscelyn, An Analysis of the WhartonEFA Automobile Demand Model, UMI Research Press, Ann Arbor, Mich., 1979.

11. B. C. Richardson, W. S. Barnett, D. C. Roberts, P. B. Sanghvi, L. D. Segel, and K. B. Joscelyn, An Analysis ofthe Jack Faucett Associates Automobile Sector Forecasting Model, The University of Michigan, Ann Arbor,Mich. [Highway Safety Research Institute report (draft)], 1979.

Page 168: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

163

WORKSHOP I - DISCUSSION

Ray Waller, Los Alamos Scier, 'ic Laboratory. RonLohrding could not be here this morning because ofanother commitment; thus, it is my pleasure to wel-come you to the first workshop, a new part of thesymposium. This session is now open for discussion.The first presentation was given jointly by Mike McKayand Andy Ford from Los Alamos Scientific Laboratory.

Carl Harris, Consultant, Washington, D. C. I amconcerned somewhat about your consideration of theunderlying stochastic character of your inputs. It wouldseem that assumption of uniform distributions for mostof the parameters may lead to excessive variability inthe outputs. Therefore, it should be possible to maketighter estimates of the results and thus make themeasier for management to understand. Did you have anyunusual reason for making such assumptions, and wouldyou gain some extra insight by altering them?

Mike McKay. Our philosophy in performing sensi-tivity analyses is to take the model developer at hisword when he specifies the range of values for whichthe model will work. If he thinks uniform distributionsare appropriate, we use them. Certainly, the variation inthe outputs could be reduced by using other distribu-tions. However, the resulting "tighter estimates" wouldbe fictitious relative to the original uniform distribu-tions.

Andy Ford. I believe that if we had used normaldistributions on many of the inputs, the variability onthe test of the COAL 2 national energy model wouldhave shrunk. For example, in the work by the Norwe-gian economist on the world oil price, the range ofplausibility was set, and the distribution for themajority of the cases was normal. Yet, in that particularexercise he found 20% of his runs exhibiting the casewhere the world oil price came down during the nextseveral decades. I agree with the general trend of whatyou said.

Carl Harris. Not to really beat the point to death, butit does seem to me that the utility - on a very majorgeneral issue — of models like this is very much afunction of the ability of the model builder to relay tothe decision-maker exactly what's going on. And there-fore, it's somehow incumbent upon the model builderto try to make probablistic statements of value. And inthis particular case, I think he can do it. I don't thinkit's really that much of a problem frankly. I think youcan tighten up some of those values. For example, theminimum and maximum values on a run or a series ofruns of a hundred are really of very little value becausethe likelihood that they're going to occur is terribly

small. So one thing you can do is to take something likethe interquartile range or something which couldautomatically sharply reduce a couple of those draw-ings. I think you can iterate with the manager, and theneventually, when you end up going back to thedecision-makers, the decision-maker really wants thevariability on the output side to be as tight as is actuallypossible within reason, because otherwise the resultsreally are questionable.

Tommy Wright, Union Carbide Corporation, NuclearDivision. Can you say more about the technique of theselection of input values, and is there an attempt tocontrol which values are selected first?

Mike McKay. In a Latin hypercube sample of size N.each input is represented by N distinct values spanningits range from low to high. The values of the inputs arecombined at random, so there is no "control" over thematching of values among the inputs.

Lee Abramson, Nuclear Regulatory Commission. Inthe examples cited, the distributions of the inputvariables were assumed independent. However, in prac-tice, many of these input variables could tend to becorrelated. Assuming independence would tend toincrease the variability in the output, to what extentcan these correlations be estimated, and how significantwould their effects be?

Mike McKay. In another presentation at this meeting,Ron Iman will talk about selecting values when theinputs are dependent, which is the situation we havewhen the inputs are correlated. In all of the cases I'vestudied, the inputs have been independent. I'm surethat, in some cases, treating inputs as correlated cansignificantly reduce output variability.

Andy Ford. On the subject of the variability of thehundred or so runs, one technique that might reducethe variability is to take the models - for example,COAL 2 - that are principally used for policy analysis.When you use a policy analysis model, you always showtwo runs: one without the policy and one with. Youcompare the two runs, and you get some difference.Now in the examples we show, we show the variabilityfrom one run to another. If you were willing to doublethe number of runs, you might do one run without oiland gas deregulation and another run with oil and gasderegulation. Then what you would display is thechange in oil and gas imports over time, which would beyour new output. You could then run the sensitivitytesting to see if that difference showed high variability.I suspect that the difference between the oil and gasimports in these policy tests as we start varying all 70inputs to the model would show a much smaller rangeof variability.

Page 169: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

164

Mike McKay. Under relatively loose restrictions, onecan show that the variance of a linear combination ofindependent random variables is maximized when therandom variables have uniform distributions. Often, weuse the uniform distribution in an attempt to maximizethe variability of the output.

Robert G. Easterling, Sandia Laboratories. I wouldlike to make some comments or ask some questions intwo areas. One, it makes me uneasy to make 100 Latinhypercube runs with 72 variables. In such a situation,the colinearity may be so high, that any variableselection is highly suspect. Variable selection in this casewill be successful only if many of these variables havenegligible effects. I would prefer to run some kind ofgroup screening design first to eliminate some variables,and then follow that up with some sort of orthogonalmain effect plan. I wonder if the authors wouldcompare this strategy to the one they used.

A second concern is interpretation. You found that in20% of your Monte Carlo runs, the Average EnergyPrice (AEP) declined. Do you attach a probabilisticinterpretation to this? For example, can we say that theestimated probability that AEP declines is 20%? Ormight you say that your personal odds against the AEPdeclining are 4 : 1? It seems to me we shouldn't attemptany such interpretation, but I'm afraid some people do.It seems to me that the purpose of these assumed inputdistributions is to generate unusual combinations ofinput variables - combinations one might not think ofif he just tried to write down 100 sets of inputs. Weshouldn't try to attach a probabilistic interpretation.

Mike McKay. With regard to the colinearity state-ment, it is true that some samples can produce highsimple correlations between two inputs. In practice, wedon't use samples where this happens. The two-inputdifficulty is a special case of the general problem that alinear combination of the columns of the design matrix(which correspond to the different inputs) are linearlydependent. For the sake of discussion, let's suppose thatcolumn 1 is a linear combination of columns 2 through5. Then any statement about variable 1 applies equallywell to the linear combination of variables 2 through 5,and vice versa. In particular, if our stepwise selectionprocedure selected variables 2 through 5, it would neverselect variable 1. This is a difficulty common to moststepwise procedures in regression analysis.

The design matrix could be studied to find columnswhich were (almost) linear combinations of othercolumns, and alternative subsets of selected inputscould be generated. This probably should be done.

Concerning screening, we are generally confrontedwith the situation that a few of the variables are"known" to be important beforehand, but the effect of

most of them is unknown. Unfortunately, we haveseveral outputs to study, and each is a function of time.I don't know how to screen when I want to study a setof input variables whose importance changes for dif-ferent output variables and at different times.

Finally, concerning your orthogonal main effectsstatement, I think we get much more information aboutan input when we use N distinct values of it spread outover its range than we would get with just two or maybethree different values. Of course, any reasonablemethod of selecting input values will work when themodel is really linear.

Andy Ford. On the interpretation of the number ofruns in which an unusual feature appears, I would nottry to attach a probability of occurrence to that, butinstead use that as an indicator of where to search forpossible flaws in the model. If you saw, for example,that the demand for oil and gas went negative in 5% ofyour runs, that would be a clue that you should dosome thorough searching for the flaw. And this, adecline in the oil and gas price, is not an obvious sign ofa flaw, but it's an unusual feature, and so it promptedus to try to examine why it occurred. We're using thismore to make sure there are no skeletons in the closetthan as a device to give projections of the probability ofsome unusual occurrence.

Richard Beckman, Los Alamos Scientific Laboratory.Bob, how do you screen 72 variables with only 100computer runs?

Bob Easterling. What I suggested was group screeningdesigns. You might group the 72 variables into 18groups of four variables. Each of these groups consti-tutes a "super-factor," which you could run in a designfor screening 18 variables. The "high" level of eachsuper-factor would involve setting each of the fourvariables in that factor at those levels thought before-hand to lead to a high response, and similarly for thelow level. Thus, you might be able to eliminate severalgroups of four variables in, say, 20 runs. Admittedly,it's very difficult to do an intelligent sensitivity analysisof 72 variables in 100 runs. That's why I'm skeptical ofturning the problem over to the unintelligent device of arandom number generator.

Tony Olsen, Pacific Northwest Laboratories. There isa reference on LHS in Technometrics that describes theprocedure. Two problems are discussed: (1) sensitivityanalysis related to model verification and (2) uncer-tainty analysis of output given specified input levels anduncertainties. The distinction is necessary when dis-cussing model evaluation.

Wes Nicholson, Pacific Northwest Laboratories. We'veheard three very interesting papers this morning whichtouch on some aspects of model evaluation. Basically

Page 170: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

165

these papers consider, first, a formal definition of themodel building process (an application of the scientificmethod) and, second, display, interpretation, and sum-marization of model output. Model evaluation has manyaspects. We need a consensus on these as a frameworkfor discussion and the development of a program onmodel evaluation.

As I see it, this modeling process includes, first, modeldefinition and statement as a mathematical system;second, verification, the debugging of a mathematicalalgorithm to solve the system; third, scoping the domainand range; fourth, validation, the comparison withreality; and fifth, the use of the model as an extensionof reality, both in detail and as a forecast.

Sensitivity analysis is part of several aspects of themodeling process. As used by McKay this morning ithelped define the range of the model. Much of thediscussion centered on the appropriate weighting ofdomain variables so that range weights reflect reality. Isthis a useful exercise? The discussion of the Latinhyper cube design to cover the domain variables in asensitivity analysis is an example of a broad question onwhat kind of experimental designs are most useful fordomain variable selection. Certainly sequential designsneed looking at. Selection of Latin domain variablecombinations based on the model output for early setsseems like a good idea. Finally, with respect tosensitivity analysis, how can we move away from theblack box approach? A problem that has long plaguedstatisticians is use of prior information. Clearly, themodel developer has some idea how the model willbehave. Also, the model is the solution to a mathe-matical system, usually a set of implicit relationshipsamong variables and possibly their derivatives. Mathe-matical analysis of the system should provide someinformation on sensitivity to specified variables.

During the validation stage, there is the question ofwhat role should comparison with subjective realityplay. Does acceptance by experts and agreement withother models constitute validation or partial validation?Clearly, final validation is objective comparison withphysical reality. Sometimes there are a few realizations;not enough for a fit and residual analysis as a functionof domain variables, but just a few benchmark points.Can something rigorous be done in such a situation?

Andy Ford. I thought the second paper and the thirdpaper presented an approach to model evaluation that'smuch broader than just the sensitivity analysis whichMike McKay and I addressed. So, their group - the MITEnergy Lab and their model assessment laboratory - istaking a broad view. And if you want another broadview of model evaluation, you may want to examine the

forum project under way at Stanford University forEPRI. Both these projects - MIT and the forum - aremuch broader in their scope than the sensitivity analysisthat you directed much of your comments toward.

Dave Roberts, University of Michigan. I think we areattempting to remove the black box atmosphere thatsurrounds some of these models. Many policy makersjust examine the output that comes out of these thingsand think that they have the forecast of what's going tohappen in, say, 1984. Part of the model evaluationprocess is determining what kind of range we have onthose forecasts. For example, I mean, is it give or take amillion cars sold in 1984, or is it give or take 300,000units sold. Part of the analysis process is examining justhow useful those numbers are — how reliable they are.

Paul Bay butt, Battette Columbus Laboratories. Whoshould be responsible for model evaluation — statisti-cians or model developers? What should be the natureof the interface between model developers and statisti-cians in the process of model evaluation?

Mike McKay. I definitely think statisticians should beinvolved with model evaluation. If nothing else, wecome into the problem relatively stupid but willing totry very stupid things on models, which generallyproduce unusual results that the model developer has toface up to. Statistics is concerned with the study ofvariation, and you can be sure that if you take a modelaway from the model developer and give it to astatistician and say, "Play with it," he's going to get itto vary. I think this is the heart of the matter. Themodel developer is very much needed, I can say fromexperience, to help out with problems that arise duringthe sensitivity analysis. I have never had a model thatthe developer said was ready to go that would work allthe way through an analysis. The model developer doesa different kind of analysis of his model than astatistician does. The statistician supplies supportingevidence, possibly guiding evidence, to the developer.But whether it's the model developer by himself or thestatistician by himself, or maybe an independent asses-sor by himself, I don't think any one of those peopleindividually can do the job correctly and completely.They all have to work together and use what each otherknows about the model.

Mark I. Temme, General Electric Advanced ReactorSystems Department. I'd like to make an observation,and then I have a particular question for Mike McKay.Although I'm not a statistician, this whole discussionhas been extremely interesting to me because I havebeen concerned about things being done in sensitivityanalysis that make me very nervous. I don't have theequipment to understand why I'm nervous or to solve

Page 171: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

166

the problems; however, this group seems to expresssome of the same concerns and perhaps also has thecapability to do something about it - which leads me towhat I wanted to ask Mike about. There have been somerecent applications of the technique that Mike describedto the breeder reactor safety technology in which verycomplicated accident analysis codes are being examined.I've been a little bothered by fundamental questionshaving to do with the validity of what I see asinterpretations of this result. (In part Bob Easterlingraised this same question.) I still suspect that maybesome of the users are not careful enough in adhering tothe laws of statistics and probability when they maketheir conclusions. Because running your model is not arandom process, you have to be careful about your useof certain statistical methods. Could you please elabo-rate on the restrictions imposed on your approach dueto the fact that the models involved are not random(e.g., why can't one use Mests, etc.?).

Mike McKay. The difficulty in applying standardregression analysis techniques to model evaluation arisesbecause the models generally do not satisfy the assump-tions (linearity and additive error) of regression analysis.Hence, fitting the output to a linear function of theinputs and using a Mest to measure the significance of acoefficient has no theoretical basis. But we don't dothis. We use the critical values for the normal correla-tion coefficient as an ad hoc procedure because it hasworked well in the past. We do not make confidencestatements about our results.

G. Schwarz, Oak Ridge National Laboratory. I wouldlike to respond to some of the previously raisedquestions on the objective of a model evaluation study.I understand that model evaluation addresses theproblem of evaluating the predictive capabilities of amodel and determining the potential uncertainty inher-ent in model predictions. The uncertainty can be bestdetermined by validation studies, that is, a comparisonof observations with predictions. If validation studiesare impractical, several steps are considered relevant formodel evaluation. These steps should comprise ananalysis of the model, the input parameter, a sensitivityanalysis, an imprecision analysis, and perhaps an analy-sis of and comparison with alternative models. However,the highest "degree of belief" in the model's capabilitiesis gained by conducting model validation studies. Anelaborate consideration of the aspects related to modelevaluation has been recently published (see Oak RidgeNational Laboratory report ORNL-5507).

Andy Ford. Somewhere in the middle of yourstatement there appeared a phrase, "the only way toimprove validity is by comparing with observations." If

you think of the validation process as a process ofinvalidation - searching for the errors that need to beeliminated - a lot of useful things can be discovered indoing simple tests with the model: (1) holding theprincipal driving function (like demand) constant andseeing how the model responds; (2) trying to verify (likethey do at MIT) if the model, when put up in-house,reproduces the published results; and (3) the simplediscoveries of the omissions in the documentation ofthe Warton model, or the programming errors in themodel investigated at MIT. All these findings arevaluable and, when improved or eliminated, can im-prove the credibility and usefulness of the model.Although the process is informal and not mathematical,it's still useful to people who are currently using thesemodels to help make policy.

G. Schwarz. That's the fifth step I mentioned - tocompare several models to evaluate or get an idea aboutthe usefulness of a model. And of course, you run intophilosophical problems. Some scientists argue that it isin principal impossible to validate a model. You canonly invalidate a model, because a model is always anabstraction of reality, so you never can absolutelysimulate a physical phenomenon or an economicalphenomenon by a model. We should realize that everymodel - if it is a validated model or a nonvalidatedmodel - is associated with uncertainty. But the highestconfidence we can get in a model is in a validatedmodel. If we have some observations, can this comparewith our predictions?

Dale Rasmuson, EG&G, Idaho. I have a couple ofcomments. At Idaho we are collecting a lot of data,and we have a group that is called the model assessmentgroup. Basically, they would be looking at data col-lected from PDF for fuels. I think they have somethinglike several hundred fuel rods that have been irradiatedat steady state conditions. They have several hundredrods that have been irradiated at transient conditions.And basically this group is just running a computer codeto see if they can match predictions or not and assessingthe difference between them. Now, I think one problemwe have is how do we as statisticians infiltrate theengineering environment? We sit here and talk about alot of good things, but I don't see us making greatstrides in really getting these ideas out where they needto be used.

Also, we have TRAC and RELAP that are used topredict the environment during an accident situation. Iknow that sometimes they will make a run and compareit against data in a similar scale, and they'll tweak somethings around and get agreement. We have variousexperimental facilities - Semi-Scale, Loft - and there

Page 172: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

167

are foreign facilities now that are producing data undersimilar circumstances. What are your feelings on howcodes such as TRAC and RELAP should be assessedover facilities?

Mike McKay. The process we use to compare TRACpredictions with experimental data are not sophisti-cated. First of all, there is now a blind prediction, wherethe developers do the best modelling they can to predictresults before the experiment is carried out. After theexperiment is run, they correct initial and someboundary-like values to match the setup actually used inthe experiment. The predicted results (usually graphswhich are functions of time) are compared by eye withthe laboratory results. I should point out that there canbe substantial experimental error.

When I now make runs, it's usually with a crudemodelling where run times are on the order of 1 - 3 hr.Ranges of variation are put on selected inputs. We useLatin hypercube sampling to generate around 20-25runs. Then, the experimental results, the best estimateprediction, and the sample of runs that I made areexamined together by eye.

D. Lynn Schaeffer, BDM Corporation, Oak Ridge,Tennessee. All of the points made by my colleague,Dr. Gunther Schwarz, are contained in a report which Ipublished on a generalized model evaluation method-ology applicable to environmental assessment models(ORML-5507). I agree that model validation should be amajor objective of model evaluation. Unfortunately,many models cannot be validated. For example, nuclearreactor consequence models, such as contained inWASH-1400, fall in this category. The NRC is conven-ing a workshop next spring to evaluate such models.Perhaps statisticians can be helpful in formulatingmethodologies for evaluating these kinds of models andfor choosing the best among a suite of such models.

Richard Beckman. In your last statement, Mike, yousaid that models are recoded to match experimentaldata. Do they use any technique such as PRESS (leavingout one data point at a time), and if they don't would itbe possible?

Mike McKay. In our NRC work, the model is notfitted to the data as it is in regression. Hence, onecannot use PRESS. Also, there is usually only 1 datapoint (or graph), so if you could "leave it out." . . . .

Dc.i Can, Pacific Northwest Laboratories. I have onepure comment and two "questions." The implicationseems to have been left that the variability of theoutput is reduced when there is a dependent relationamong the input variables. This is not necessarily thecase for nonlinear models. My first question is, to whatextent have you automated or computerized yourprocedures, so that new models can be examined with a

minimum of effort? I can see the value of procedureslike yours for shaking down a model, for seeing if amodel responds the way its developers intended. Ran-dom runnings of the model may pick out flaws thatjudicious, but unconsciously biased, selection of inputvalues may fail to find. However, have you made it veryclear to the people who use such models for decisionmaking that model shakedown does not constitutescientific validation of the model and that your findingsmay have nothing at all to do with the soundness oftheir decision making?

Mike McKay. Usually, the dependencies introducedamong inputs are such that they exclude regions of theinput space where the model would produce "wild"answers. For this reason, I say dependencies reducevariation.

Now, our procedures are all computerized, but theyare not automatic in the sense that all one has to do isto supply a deck of cards as input. Having decided whatone wants to vary in the model, it's a simple matter togenerate the set of input values. However, it can betroublesome to modify large and complex models to setthe numbers in and produce a file of output values.Finally, we have codes which do the analyses with theinput and output files.

As for our findings having "nothing at all to do withthe soundness o f decisions about the validity of amodel, that statement is incorrect. You said earlier thatour techniques "may pick out flaws" in a model. Ibelieve this is very useful for demonstrating deficienciesin a model.

If you are saying "scientific validation" as if therewere a set of necessary conditions which would have tobe demonstrated before a model could be declaredvalid, then it is unlikely that any substantive modelcould be "proven valid." You might say that we arelooking for evidence that supports invalidity. And thatnot finding any after a vigorous sensitivity analysisincreases our confidence in the model.

Paul Baybutt. It seems to me that there is a great needto define what one means by model evaluation. Fromthe presentations by today's speakers and commentsfrom the floor, it appears to me that there really is not aconsensus as to what is meant by model evaluation. AndI think that that perhaps would be the first step.Secondly, I think what one would then need to define isthe role of statisticians in model evaluation. It wasmentioned by Dr. Schwarz that sensitivity analysis anduncertainty analysis are two areas within model evalua-tion.

Andy Ford. This relates to an extremely unphilo-sophical problem. It has nothing to do with the role ofmodels, the role of statisticians, or "what is truth" or

Page 173: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

168

"what is validity." The point I'd like to make is thatmodel evaluation or model validation is a growing areaof interest because the Congress called on the Depart-ment of Energy to assess the validity of the models thatwere built, and a significant amount of research fundspoured into that area. Consequently, a smaller amountof research funds was available for what was previouslyknown as model development. One possible conse-quence of this shift of funds is that a model developerwill develop his model; run out of money; call up agroup; and say, "I want you to validate this device."They hand it to him, and they find there's nodocumentation; there's no base-case run; there's noexplanation of the estimation; and under the guise ofmodel evaluation, some subcontractor proceeds to dothe job that should have been done first by the modeldeveloper. And this particular problem could loomlarger as funding is more available in the category ofmodel evaluation than in the category of modeldevelopment.

David Roberts. Certainly, the first round of modelvalidation or model evaluation should be done by themodel developer. And hopefully, in the future, con-tracts for model development will include a section thatwill be exclusively used for validating the purposes ofthe model. It is hoped, at least in our group, that thiswill occur. Unfortunately, it seems that someone -model builders? sponsors? - underestimates the amountof effort required to do these models.

Mike McKay. I'd like to close by returning to the firstquestion concerning the amount of variability in theoutput in our study. Many model developers seem tooverestimate both the domain of applicability of theirmodels and the precision of their predictions. Animportant function of the statistician is to give to the

developer a realistic picture of the pattern of modeloutputs by a good sampling of the input space. Thisprocess can serve both to help satisfy the developer thatthe model is working and also to identify regions ofinterest that might otherwise go unnoticed.

Jim Gruhl. I just have one further point that didn'tcome up in the discussion here and that is that theassessment efforts, particularly independent assessmentefforts, shouldn't stand by themselves. They alsodeserve critique, and the most valuable critique, ofcourse, is going to come from the model builder himself,who will look at the assessment and should have arebuttal chapter in any assessment reports describingmisconceptions that the assessors had. But there are anumber of other ways in which the validation effort canbe assessed, and I think that aside from just checkingthe completeness of the coverage of the model andtechniques and so on, the assessments could again beassessed by somebody else and I believe that there'sroom for additional organizational work on that wholeassessment.

Dave Roberts. The model evaluation process is not astatic thing. It's going to be evolving over time, too. Justas models improve to take into consideration some ofthe criticisms made by model evaluators or buildersthemselves, the evaluators' expectations of the modelsare going to change, too.

D. A. Gardiner, Union Carbide Corporation, NuclearDivision. A report attempting to define model evalua-tion and the role that the mathematician or statisticiancan play will soon be issued. The main authors are TobyMitchell *md George Wilson. It will receive the standarddistribution and should be at your laboratory soon.

Thank you for your participation in this first work-shop.

Page 174: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Workshop IIRisk Analysis

Roger H. Moore, Organizer

SUMMARY

The disciplines of statistics and risk analysis share many conceptual and practical aspects. This workshop wasorganized to examine and display these areas of mutual concern and to delve into issues still unresolved.

Other workshop sessions at this symposium revealed strong requirements for careful systematic problemstatements and early involvement of statisticians with subject-matter specialists. Risk analysis is no different; thefour papers and the subsequent panel-audience discussion revolved around these generic matters.

Another purpose underpinned the organizing of this particular workshop: the bald and unabashed appeal to thestatistical community to increase its efforts in risk analysis - and in its follow-on effort, now often called riskassessment - while the subject is developing in a wide variety of societal/governmental/scientific arenas. To cite onlyone example, the National Science Foundation is organizing a series of workshops on Risk and Decision Making tobe held in 1980. "The objectives of the workshops will be (a) to assess the current state of knowledge concerningmajor issues in risk assessment and the use of information about risk in the decision-making process'and (b) todevelop an agenda for future research on the science and technology aspects of risk assessment." (The quotation isfrom an NSE letter signed by Alden S. Bean, dated August 1, 1979.)

The information and ideas developed during this 1979 DOE Statistical Symposium workshop on risk analysisshould serve as a state-of-the-statistical-art point of departure for statisticians and others participating in thisemerging topic. Such propinquity does indeed seem propitious.

PAPERS

Taken as a group, the four formal workshop papers provide a review of risk analysis that is deliberately tiltedtoward statisticians. The papers are tutorial in nature, but they are not nontechnical.

Taken individually:Moore seeks to " . . . focus, examine, and juxtapose the various issues and interfaces and interferences between

the two disciplines that are the subjects of this workshop: statistics and risk analysis."Rowe, in describing society's methods for coping with risks, states, "A variety of approaches and processes have

been proposed. Although many of these are analytical . . . , the problem which involves human values and opposingconstituencies cannot be solved by rigorous analysis alone."

Vesely concentrates on the " . . . probability and statistics problems [that] arise when an attempt is made toquantify nuclear power plant risks and to use the risk results in decision making."

Uppuluri, declaring "an urgent need" to provide mathematical foundations for risk analysis, asks, "What can themathematical and statistical community of DOE do to alleviate this situation?''

Material provided by the authors for inclusion in these Proceedings follows.

169

Page 175: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM
Page 176: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Statistics and Risk Analysis—Getting the Acts Together

Roger H. MooreOffice of Management and Program Analysis

U.S. Nuclear Regulatory CommissionWashington, D.C.

ABSTRACT

Everybody knows what statistics is/are . . .

Everybody knows what risk analysis/assessment is/are . . .

Or uo they/we? . . .

The purpose of this discussion is to provide a metaphorical fresnel lens/prism through which wecan focus, examine, and juxtapose the various issues and interfaces and interferences between the twodisciplines that are the subjects of this workshop: statistics and risk analysis.

A POINT OF DEPARTURE

Item 1: The 1978 Directory of Statisticians, publishedby the American Statistical Association, contains 15,828names.

Item 2: The registration list for the Symposium/Workshop on Nuclear and Nonnuclear Energy Systems,Risk Assessment, and Governmental Decision Making,sponsored by the MITRE Corporation in Washington,D.C., February 5-7, 1979, contains 129 names.

Item 3: The intersection of Items 1 and 2 contains 11names.

Both bad news and good news can be claimed. Only0.07% of people listed in the Directory attended theSymposium/Workshop - but 8.53% of the Symposium/Workshop attendees are listed by the Directory! Themost important conclusion, however, is that the intersec-tion list is not empty. A mixing process is developing.

Because no set of proceedings from the Symposium/Workshop crossed my desk before this writing (Septem-

ber 23, 1979), 1 invoke the divine right of workshoporganizers and offer the following observations culledfrom the meeting report I filed with my Office Directoron February 13, 1979. The collective of concepts andprocedures designated as "risk assessment" (RA) re-ceived a rather thorough airing, reflecting many pointsof view, with plenty of work remaining for both thethinkers and doers connected to this nascent discipline.Among the points^ I believe especially pertinent to this1979 DOE Statistical Symposium are the following:

1. There is no established definition of RA, noestablished RA procedure, and no established rolefor RA in government decision making.

2. The public view of "risk" is not at all the same asmathematical definitions.

3. Both qualitative and quantitative RA are impactedby social, political, ethical, and moral concerns.

4. Courts - as well as Congress - strongly influencegovernmental agency priorities.

*This paper represents the views and opinions of the author; itis not officially expositive of the Commission's position and notbinding on the Commission.

of these items are paraphrases of remarks made byspeakers and other participants; they all reflect my interpreta-tion and understanding of the particular issues.

171

Page 177: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

172

5. The true "value" of a regulation can be obtainedonly after it is established.

6. Regulations on health and safety are "uniformlyawful" owing to low-calibre quantitative methods.

7. It was proposed that risk be defined as theprobability density of consequences ("losses") toprovide a working basis for discussion.

8. Hazard is to be distinguished from risk.

9. Is the notion of probability well understood — interms of individual reactors? in terms of publicperception? in terms of expertness?

10. RA must be performed openly, with peer checkingand public involvement. Informed skepticism is tobe encouraged.

11. Nobody seems to know a good way to assess publicperception of risk.

12. RA is "on the margin of knowledge" as far as mostgovernment agencies are concerned.

A number of other thoughts, less directly connected tothis workshop, were derived from the MITRE meeting:

1. RA was performed in ancient times in the begin-nings of merchant insurance associations.

2. Governments have always been in the business ofassessing risk, but formalization began during thelate 1960s.

3. "Minimum risk" is now being replaced by "com-parative risk."

4. Agencies tend to overestimate benefits and under-estimate costs.

5. The Occupational Safety and Health Administration(OSHA) does not accept "zero risk" concepts.

6. OSHA does not use cost/benefit methods in writinghealth and safety regulations.

7. The Environmental Protection Agency (EPA) com-pares hazard with its exposure.

8. Two nuclear engines produce all the earth's energy —the molten center and the sun.

9. No energy system is "completely" benign.

10. The Ford Pinto case is used to flag enthusiasm andto spur more work.

11. Somehow, the Federal Government must reach anagreement on the "cost of human life" issue.

12. Nuclear power is supported by the "general public"in many polls, but it is not considered as "viable" asother power sources (coal, oil, and/or solar) by

Washington leaders (Congress, administrativeofficers, etc.)

13. It is doubtful that the "probability of sabotage"concept will bear fruit.

14. Don't expect "the Government" to respond becauseit is composed of special interest groups whichsupport different sides of any issue.

Finally, a personal comment: It was unfortunate thatmany of the government speakers of the first morning'ssessions left the scene as soon as they were finished anddid not seem to be around to hear the rest of thediscussions, for their actions and attitudes are vitalaspects of risk-related technology and its role in makingsocietal choices.

WHAT ARE WE TALKING ABOUT?

For me, one of the striking moments of the 1976 DOEStatistical Symposium was Brian Joiner's rejoinder notto assume that our audiences know what we are talkingabout and not to avoid returning to first principles toenhance our communications. He then proceeded todefine statistics to an audience of statisticians. Auda-cious? Perhaps, but I have applied that principle manytimes in the years since, and a few hooded eyes is a smallprice to pay for the security derived.

Thus it is that I am inspired to offer in the next twosubsections some concepts and activities relating tostatistics and risk assessment. For some in this audience,these offerings are too confining and too restrictive; forothers, they are too loose and too fuzzy to be of anyvalue. Emphasis is placed on central aspects; the labelsand the words are derived from an ecumenical spirit, notfrom a hard-core position that these-and-only-these aredefinitive.

Buoyed by this high-principled statement — and har-boring a not-too-easily-subverted set of professionalprejudices - let us plunge into waters sometimes dark,sometimes only murky, and sometimes astonishinglyclear.

Information, Data, Statistics, and Statisticians

Information: That which is received through the actof imparting knowledge.

Data: Those elements of information that are either(a) quantified by their basic nature, such as measure-ments of distance or volume or counts of events orentities, or (b) capable of being categorized, such as typeor color or opinion.

Statistics: The science and the art of the treatment ofdata - their collection, organization, analysis, presenta-

Page 178: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

173

tion, and interpretation - and the conceptual frame-work supporting that treatment. (As a plural noun,Statistics is a synonym for data and quantities that arefunctions of data.)

Statisticians: Practitioners of the discipline of statis-tics, developing the necessary theory and applying theresulting methodology to a variety of subject-matterareas. The discipline of statistics can and should bedistinguished from such disciplines as mathematics,probability, operations research, risk analysis, qualityassurance, and systems analysis, recognizing that each ofthese disciplines contributes to and receives from theothers. A major function of modern professional statis-ticians is the proper treatment of probabilistic phe-nomena and their characterization in consonance withdata; another is directed at methods by which informa-tion can be structured into data.

Thus, with a few bold strokes, a discipline calledstatistics has been sketched; it is sufficiently catholic inits nature to embrace Karl Gauss and Dizzy Dean, theReverend Thomas Bayes and the eclectic economist, theelectric meter reader and the redoubtable race track tout.However, for a number of reasons, including those givenin the first section, a similar sketch of risk assessment isnot so facilely constructed. In the following section, weexamine some of the elements requiring attention.

Hazard, Consequence, . . . , and Risk Assessment

Hazard: A thing or condition that might operateagainst success or safety; a possible source of peril,danger, duress, or difficulty. (A rock on a cliff over ahighway is a hazard to a motorist who desires tonegotiate the highway.)

Consequence: An effect or result usually consideredto be undesirable, when a hazard goes into operation.(The rock falls and the consequence can take any ofseveral forms: the motorist is killed, the motorist'svehicle is damaged, the highway is closed, nothing worthnoting occurs - i.e., "zero consequence," . . . )

Probability: A numerical measure assigned to eachmember of a set of events in order to elucidate the"likelihoods"* of the occurrences of the events. [Dif-ferent schools of probabilists, e.g., the subjectivists, thefrequentists, the likelihoodists, have differing methods

•"likelihoods" are defined in terms of degrees of belief, rela-tive frequencies, or other (sometimes mysterious) concepts. Todeclare that further discussion about "likelihoods" and theirquantification is beyond the scope of this paper is nor a cop-out.For the purposes of this paper, "likelihoods" are the primitivesupon which the rest is built.

for assigning probabilities and for making inferencesfrom them, but the distinctions are encompassed by thisstatement. Note T. P. Speed's remarks in the section"Pulling It Together."] (For each consequence of therock fall, a value is assigned to conform to the "rules ofprobability.")

Risk: A measure of the outcome of a hazard's goinginto operation, usually expressed as a function of the setof consequences and the corresponding probabilities. (Afunction of the quantification of the consequences —say, dollars - and the corresponding probabilities stem-ming from the rock's falling.)

Risk Analysis: The detailed examination that iscarried out in order to understand the nature and todetermine the essential features of risk.

Qualitative Risk Analysis: The branch of risk analysiswhose scope is to identify and to characterize thecomponents of risk.

Quantitative Risk Analysis: The branch of risk analy-sis whose scope is to determine the magnitudes of thecomponents of risk.

Risk Assessment: The critical appraisal or evaluationof a risk analysis, primarily for the purpose of deciding ifa risk emanating from a given hazard is "acceptable" orif it can be "traded of f for another risk; if not, then theresult of the assessment may be to require elimination ofthe hazard itself or the development of measures toreduce the risk by modifying the consequences and/ortheir associated probabilities. (N.B.: Risk assessment is amuch broader concept than risk analysis.)

The following comments should be noted:1. Risk analysis and risk assessment are disciplines in a

state of rapid development, change, and growth. Theentries proposed here are limited to ideas that seem tobe "relatively" stable within the risk assessment com-munity. Carefully avoided are the burgeoning ideasrelated to such topics as "dimensions" of risk, percep-tions of risk, legal implications of risk assessment, and''risk accounting."

2. Primary sources (in no particular order) influencingthis suggested glossary include Symposium/Workshop onNuclear and Nonnuclear Energy Systems Risk Assess-ment and Government Decision Making, the MITRECorporation, Washington, D.C., February 5-7,1979; AnAnatomy of Risk by W. D. Rowe; On Acceptable Riskby W. W. Lowrance; "Philosophical Basis for RiskAnalysis" by C. Starr et al.; Webster's Third New Inter-national Dictionary; and The American College Dic-tionary. (Fuller references appear in the bibliography.)

3. Whatever are the ultimate choices for a particularsubject matter area's treatment of "risk," practitionersought not to stray too far from the "mainstream" of

Page 179: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

174

rigorous, carefully justified risk assessment. Academia,government, and industry can ill-afford toleration ofwork with inferior intellectual and practical content.There must be a flexibility in the subject matter area toreflect changes as they occur in the discipline of riskassessment - and as requirements change within the areaitself. This is not to imply that all subject matter areasmust adhere to a single precept when a new approachwill serve better.

Thus, the sketching of a discipline called risk assess-ment requires the juggling and ordering of a number ofideas and concepts. The glossary given here is, at best, aset of suggestions to fuel the debate. Particularly vexingis the definition of "probability." Moreover, there is thenagging problem concerning which portions of work byrisk assessment pioneers are to be kept sacrosanct, whichare to be remodeled, and which are to be discarded. It isnone too soon to make serious efforts at defining riskassessment's frost line and to seek appropriate founda-tions upon which to build. This requires more than mereformalism of mathematical relrtionships; in the govern-mental context especially, the public perception of riskcannot be ignored.

PULLING IT TOGETHER

It's heady experience for a practitioner from onediscipline to pontificate and delineate another discipline,and it's dangerous as well. Witness the many forays intostatistical texts by chemists and sociologists. If there isto be useful interaction between disciplines, however, itis a risk that must be taken.

To date, there has been precious little interactionamong risk assessors, probabilists, and statisticians. Sonicskirmishes have been useful, others have inflicted endur-ing pain; see, for example, Easterling (1980), Meehan(1979), and Weaver (1979).

T. P. Speed, way off in Western Australia, has beenstruggling with these matters. I asked him to commenton an early version of these risk assessment concepts,and he sent the following in a letter dated May 4,1979.:

On the topic of terminology, I certainly agree that it isimportant. To focus on areas of vagueness, and, ifpossible, areas in which we know what we are talkingabout (where what we measure is not only what we thinkwe are measuring, but is what we want to measure, and isthe right thing to measure, . . . ) is a task whose impor-tance can hardly be overrated yet statistics books andcourses say so little about it.

. . . one thing cries out f to me] for modification: thestatement on probability. It MUST be said that probabili-ties are numbers attached to events or propositions, andare evaluated by someone, under some assumptions (givensome hypotheses), by some method. Until we know the

evaluator, the assumptions, and the method, we have noidea what quantitative inferences can be made, i.e., we arespeaking of "pr(A/H) by person using method V."

hypothesis orgiven assumptions

Surely enough problems have been caused in the area ofQRA [quantitative risk analysisj by mixing "statistical"probabilities, "a priori" probabilities, and "personal"probabilities (not to mention the more refined schools)for you to feel a strong need to avoid a definition orexplanation of the term which allows "equal status" toall. They are not just different "methods," they areworlds apart in "meaning" as well, and I would be mostreluctant to see any parity between them acceptedwithout a fight.

. . . this is just an off-the-cuff response. I'll send a morecareful reply shortly, but a general remark: the ones weneed to be most careful about are the ones which getquantified. Accordingly we need not lose too much sleepover an inability to define hazard or consequence in acompletely satisfactory way. But our inability to do sowith probability and risk should never be forgotten!

Sorry if this sounds like a tirade: it is a topic which stirsme.

All of the emphasized and quoted jargonal phrases inthe foregoing are Speed's own. There are messages for allof us; to fail to grapple with the implied issues could,literally, place us in peril.

A READING ASSIGNMENT

Each of us, as we enter a new arena of activity,possesses a tendency to press for the quick hit, to seeissues in simple terms, and to wonder why the par-ticipants are still struggling over elementary issues whenthe far horizons are grander and more to our liki.ig. Butthe flip side of this attitude has been attribui.ee! to PaulAnderson, who, according to Dickson (1978), statedwith unmistakable clarity: "I have yet to see anyproblem, however complicated, which, when you lookedat it the right way, did not become still more compli-cated."

One way to reduce this hazard and its consequences isto steep oneself in ths literature of the subject matterarea so that some control can be gained over thesantayanian dictum-relating history, repeatability, anddoom. The bibliographic material attached to this paperis not intended to be exhaustive, but it is intended toprovide the reader with a sense of what has gone beforeand the current issues in risk assessment.

The "essential tension" described by H. Kuhn (1977)is established in the communities of statistics and riskassessment. It is time to get on with the discovery andthe application of proper analgesics.

Page 180: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

175

ACKNOWLEDGMENTS reviews: David Rubinstein, Dan Lurie, and Lee Abram-„, , . . . . . . son. Their cover will not by maintained.The content and orientation of this paper were „ . _ , , , . . „ „ . . . . . ,.ft ... ..

. . „ . , . r J . . Craig Rowland, also NRC, provided a big lift with hisstrongly influenced by hours of discussion, argument, „ , . . . , . , , . . . . . , . , .

. . . . , . , , • -w fr j u careful structuring of the bibliographic material and hisand criticism (constructive and otherwise) proffered by , , . , , 6 . , „ 6 *.

, , . assembly of the data in the first section,numerous acquaintances, some of whom may recognize _. . . . . . ,. . , c.. >-.t . . .,M . ' ... . 4 , . ~! . Timely and critical (in at least two senses of tne word)their contributions and will appreciate having their , ... ' . , t. ff ( . . . .

j XT L I U i xTn/1 drafting, typing, and proofing efforts were provided byanonymity protected. Nevertheless, three close NRC _ . , b

r., ». fT D u \„ A A it •• f *. *u t- Sue Young, Eileen Moore, and Terry Barnhart.

colleagues provided especially creative feet-to-the-fire

AN ARBITRARY BIBLIOGRAPHY

Bean, Alden S., "Addendum to PRA Program Announcement NSF 78-78," Letter, National Science Foundation,Washington, D.C., Aug. 1, 1979.

Carbon, Max W., "Report of Quantitative Safety Goals," Letter to Joseph M. Hendrie, Chairman, U.S. NuclearRegulatory Commission, May 16, 1979.

Carroll, Lewis, 77te Anno ted Alice, Introduction and Notes by Martin Gardner, World Publishing, Cleveland, 1968[especially p. 146].

Cholmondeley, Leroy, and Mayer, Hugo, "On a New Method in Hazardology," /. Irreproducible Results 24: 17-18(1978).

Comar, Cyril L., "Risk: A Pragmatic De Minimis Approach," Science 203: 319 (1979).Dickson, Paul, The Official Rules, Delacorte, New York, 1978. [Excerpts appear in The Washingtonian, November

1978, pp. 152-55.]Easterling, Robert G., Letter, The American Statistician, expected to be published in February 1980 issue.Garb, Solomon, Hickman, Howard M., and Smith, Kirk R., "The Right to Take Risks," Am. Sci. 67: 266-68

(1979).Holdren, John P., Smith, Kirk R., and Morris, Gregory, "Energy: Calculating the Risks (II)," Science 204: 564-67

(1979).Innis, George S., "Statistical Quality," Science 204: 242 (1979).Kolata, Gina Bari, "Frederick Mosteller and Applied Statistics," Science, 204: 297-98 (1979).Kolata, Gina Bari, "Scientists Attack Report That Obstetrical Medications Endanger Children," Science 204:

391-92(1979).Kuhn, Thomas S., The Structure of Scientific Revolutions, Univ. Chicago Press, Chicago, 1962.Kuhn, Thomas S., The Essential Tension, Univ. Chicago Press, Chicago, 1977.Lowrance, William W., Of Acceptable Risk, Science and the Determination of Risk, William Kauffman, Los Altos,

Calif., 1976.Martin. J. David, "A Measure of Association for All Seasons,"/. Irreproducible Results 24: 19 (1978).Meehan, Richard L., "Nuclear Safety: Is Scientific Literacy the Answer?" Science 204: 571 (1979).Nielsen, Sigurd O., "Risks and Public Policy,"Science, 205: 748-49 (1979).Okrent, David, Testimony Presented to the Forum on "Risk/Benefit Analysis in the Legislative Process," Subcom-

mittee on Science, Research and Technology, Committee on Science and Technology, U.S. House of Representa-tives, July 25, 1979.

Okrent, David, "Risk Accounting," Science 204: 1154 (1979).Rothschild, Nathaniel Meyer Victor, "Risk," Richard Dimbleby Lecture, British Broadcasting Corporation 1,

Nov. 23, 1978. [Excerpts appear in "Coming to Grips with Risk," The Wall Street Journal, March 13, 1979.]Rowe, William D., An Anatomy of Risk, Wiley, New York, 1977.Segan, Leonard, "Radiation and Human Health," EPRIJ. 4(7): 6-13 (1979).Scherer, F. M., "Statistics for Governmental Regulation," Am. Stat. 33(1): 1-5 (1979).Shapley, Deborah, "NASA Says FAA Understates Air Crash Risk," Science 204: 387-89 (1979).Starr, Chauncey, "Public Perception of Radiation Risk," EPRIJ. 4(7): 2-3 (1979).Starr, Chauncey, Rudman, Richard, and Whipple, Chris, "Philosophical Basis for Risk Analysis," Annu. Rev. Eno-gy

1:629-62(1976).

Page 181: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

176

Union of Concerned Scientists, The Risks of Nuclear Power Reactors, a Review of the NRC Reactor Safety StudyWASH-1400 (NUREG-75/014), Union of Concerned Scientists, Cambridge, Mass., 1977.

U.S. Nuclear Regulatory Commission, Reactor Safety Study, an Assessment of Accident Risks in U.S. CommercialNuclear Power Plants, WASH-1400 (NUREG-75/014), National Technical Information Service, Springfield, Va.,October 1975.

Wade, Nicholas, "Thomas S. Kuhn: Revolutionary Theorist of Science," Science 197: 143-45 (1977).Weaver, Suzanne, "The Passionate Risk Debate," The Wall Street Journal, April 24, 1979, 11-12.Westman, Robert S., "The Kuhnian Perspective" [review of The Essential Tension by Thomas S. KuhnJ, Science

201:437-38(1978).Wildavsky, Aaron, "No Risk is the Highest Risk of All," Am. ScL 67: 32-37 (1979).Zivi, S. M., "Light Water Reactor Safety and Public Acceptance," pp. 259-72 in Nuclear Energy and Alternatives,

ed. by Osman Kemal Kadiroglu, Arnold Permutter, and Linda Scott, Ballinger Publishing, Cambridge, Mass.,1977.

Page 182: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Distinctions Between Risk Analysis and Risk Assessment

William D. RoweAmerican University Institute for Risk Assessment

American University, Washington, D.C.

ABSTRACT

Historically, society has been able to cope with both individual and societal risks on an ad hocbasis. The increasing complexity of society, caused, at least in part, by dependence on sophisticatedtechnology, changing institutions, and revaluation of older values, requires a more formal approach toestablish societal risks than heretofore. The formal process of risk assessment requires a balancing ofcosts, risks, benefits, and poliiical factors, including establishment of acceptable levels of societal risk.Few things are risk free, and the equitable distribution of residual involuntary risks imposed upondifferent parts of society is a key aspect. A variety of approaches and processes have been proposed.Although many of these are analytical approaches, the problem that involves human values andopposing constituencies cannot be solved by rigorous analysis alone.

Although analytical techniques for risk assessment are numerous and offer some opportunities foraddressing the problem, there are basic underlying limitations that must be well understood at theoutset. No analytical technique or combination of techniques will solve the problem. The best that canbe expected is clarification of issues and identification of critical parameters, decisions, and judgments.The decisions must be made by other processes.

177

Page 183: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Some Notable Probability and Statistics Problems Encountered inRisk Analysis Applications

W. E. VeseiyOffice of Nuclear Regulatory ResearchU.S. Nuclear Regulatory Commission

Washington, D.C.

ABSTRACT

A variety of probability and statistics problems arise when an attempt is made to quantify nuclearpower plant risks and to use the risk results in decision making. Some of the more notable problemsinclude (1) measures of uncertainty to use, (2) the way dependent failure probabilities are to bequantified, (3) believability of fault tree/event tree modeling, (4) the handling of subjective data, (5)whether "absolute" or "relative" probability evaluations are more believable, (6) Bayesian vs classicalapproaches, and (7) the way to use risk rtiults and their uncertainties in decision making. These sevenproblems are addressed; research projects being conducted are described; and some worthwhile, addi-tional studies are proposed.

INTRODUCTION

Whenever we attempt to estimate the probabilitiesand consequences associated with "catastrophic" nu-clear reactor accidents, we encounter a variety of prob-ability and statistics problems. These probability andstatistics problems are not unique to risk analyses ofnuclear reactor accidents, but they arise whenever weattempt to quantify low-probability, high-consequenceevents so that we can evaluate the associated risks.*

Based on risk analyses applications carried out bythe Probabilistic Analysis Staff of the Nuclear Regula-tory Commission (NRC), some of the more "notable"probability and statistics problems which we have en-countered include:

*We should point out that when we use "risk," we use it in abroad sense meaning "hazards" associated with events havingcertain probabilities and consequences. (We do not use "risk" tomean some statistical average, such as the average loss in decisionanalysis.) Furthermore, we do not differentiate here between"risk analysis" and "risk .assessment," taking both to meanevaluations and studies dealing with event probabilities andconsequences.

1. How do we evaluate the uncertainty that is associ-ated with an estimate of probability and conse-quence?

2. How do we quantify dependent failure probabilities?

3. How believable are the fault tree and eveni treemodels used in a risk analysis?

4. How do we handle "subjective" data?

5. Do "absolute" or "relative" probability evaluationshave less uncertainty?

6. Do we use Bayesian or classical statistical ap-proaches?

7. How do we translate risk analyses results for decis-ion making?

There are certainly other probability and statisticsproblems, and we don't claim that the above are themost difficult. However, the above problems are "nota-ble" in that they prominently arise when actual riskevaluations are attempted. We will address the aboveproblems in the remainder of this paper, breaking thepaper into seven sections for the seven problems. WewUl attempt to identify specific obstacles confrontingthe analyst, will describe research being conducted by

178

Page 184: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

179

the Probabilistic Analysis Staff, and will propose addi-tional studies that might be performed. We will alsoattempt to formulate some conclusions from our dis-cussions.

Note that we are approaching these problems froma pragmatic analyst's point of view and do not pretendto give erudite dissertations. There will be no theoremsor lemmas proved in our discussions; however, we hopethe professional statisticians and mathematicians whoare inclined to do such things will find something ofvalue in our work. We also hope the anlayst and engi-neer will find our discussions useful.

MEASURES OF UNCERTAINTY

In assessing the risks associated with catastrophicaccidents, one of the notable statistics-related problemsfacing the analyst is how to quantify the large uncer-tainties generally associated with his estimates. In try-ing to express the uncertainties associated with thepoint estimates of probabilities and consequences, theanalyst must first decipher definitions and phraseolo-gies. Should the analyst do "sensitivity studies" orshould he attempt to apply more formal "error propa-gation techniques"? What kinds of "uncertainties" andwhat kinds of "variabilities" should the analyst includein his evaluations?* Furthermore, should the analystinclude "systematic errors" or "random errors"? If theanalyst attempts to apply more formal probability andstatistics approaches, then he must decide whether heshould calculate one-sided bounds or two-sided inter-vals. If, for example, he does decide to calculate inter-vals, then he must decide whether they should be "con-fidence intervals," "tolerance intervals," "predictionintervals," "plausibility intervals," "fiducial intervals,""percentile intervals," or "Bayesian intervals." (If hedoes decide to calculate one-sided bounds, then he goesthrough similar alternatives with "bounds" replacing"intervals.")

In addressing uncertainties, the analyst does notusually go through such detailed questioning; however,he usually is enough informed to recognize that thereare various measures of uncertainty. The fact that thereare different measures of uncertainties often causes theanalyst to be frustrated, since the different uncertaintymeasures appear to be different for no reason at all orfor philosophic rather than pragmatic reasons. (The dif-ferent measures often give similar numerical values,

*ln our discussion here, we will use "uncertainties" and"variabilities" in the general sense according to Webster and willnot attach any strict statistical definition to each word.

don't they?) The outcome of this frustration may of-ten be that the analyst will decide not to quantify anyuncertainties at all - not even in a gross subjectiveway! (If the analyst were to quantify uncertainties, thedecision maker might not know how to use them any-how; this problem is covered later.)

As another justification of not quantifying uncer-tainties, the analyst sometimes argues that since he can-not formally quantify, or even subjectively quantify,all the uncertainties (due to modeling errors, lack ofcompleteness, etc.), it would be deceptive if he quanti-fied any uncertainties at all. Therefore, the analystagain decides not to quantify any uncertainties no mat-ter how large they are, but simply mentions in his re-port that "there are uncertainties." He may even say,"there are large uncertainties." However, this is theextent of the uncertainty analyses, and these warningsand comments are usually ignored in applying thepoint estimates.

The analyst sometimes gets into this frustrating sit-uation, and we speak from experience here, because hedoesn't explicitly enumerate all the different kinds ofuncertainties and variabilities that can be associatedwith his estimates or accident probabilities and conse-quences. The problems about what uncertainty meas-ures to use also arise because the analyst doesn't explic-itly separate the parameters, those constants that arefixed but perhaps imprecisely known, from the randomvariables, those variables that randomly vary due tophysical causes or due to statistical model assumptions(i.e., Bayesian analyses). The probability distributionsassociated with the random variables may be impre-cisely known, further contributing to the overall uncer-tainty.

The Probabilistic Analysis Staff at the NRC is in-volved in a variety of projects which address uncer-tainty considerations. A project is being carried out inwhich component failure rates are treated as randomvariables and an appropriate distribution selected usingparametric empirical Bayesian techniques [1]. A proj-ect has been planned with the Statistics Group at theOak Ridge National Laboratory to address variousprobability and statistical aspects of risk analyses, in-cluding uncertainty evaluations [2]. All risk assess-ments performed under the auspices of the Probabilis-tic Analysis Staff have some form of uncertaintyevaluation carried out, which is described in the riskassessment reports. (For a description of the variousprojects being carried out by the Staff, see ref. 3.)

To aid the analyst in selecting uncertainty measuresand to complement ongoing efforts, we think it wouldbe useful if "primers" were prepared in which the dif-

Page 185: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

180

ferent measures of uncertainty were clearly definedand their applicability clearly discussed. The discus-sions would be most useful to the risk analyst if theywere not saturated with statistical jargon or sophistryand were carried out in context of a risk analysis ad-dressing uncertainties associated with probability esti-mates (e.g., system unavailabilities, component failurerates, etc.) and addressing uncertainties associated withconsequence estimates (e.g., source terms, health ef-fects, etc.). As a problem-solving application, an ac-tual risk analysis could be performed, and the differentsources of uncertainty and the different uncertaintymeasures could be evaluated. These "primers" couldtake the form of guideline documents, and the effortcould be carried out in conjunction with one or moreprofessional organizations (among other options).

ESTIMATION OF DEPENDENTFAILURE PROBABILITIES

In a risk analysis, if the analyst wants to naively esti-mate the probability of multiple components failing(or multiple events occurring), then he assumes thecomponent failures are independent and simply multi-plies the individual component failure probabilities to-gether. Component failures (or events), however, canbe dependent, and the fear of the analyst and decisionmaker is that dependencies can cause the actual proba-bility of multiple components failing to be drasticallylarger than the probability calculated assuming inde-pendence. Because large consequences are often associ-ated with multiple components failing or multipleevents occurring, the effects of dependencies can havevery significant impacts on assessments of probabilitiesof catastrophic accidents.

Since the publication of the Reactor Safety Study[4], other papers have discussed possible ways of esti-mating dependent failure probabilities [5-8] . Some ofthese papers were reacting (to put it mildly) to the"subjective" technique used in WASH-1400 where de-pendent failure probabilities were estimated by the in-famous "square root law."* Even though various tech-niques have been proposed, we have not seen anyalternative approach which is "statistically valid" and,at the same time, has some physical basis and is reailyusable by the analyst and engineer (our definition of"usable" includes not having to wild-guess input data

required for the model). Of the various alternative ap-proaches proposed, we feel that the beta-factor model[5] is one of the most reasonable and is usable incertain situations; however, its applicability is limited.

A class of dependent failures that is important andtractable in a number of instances is the class of de-pendent failures called "common cause failures." Com-mon cause failures are multiple failures that are attrib-utable to a single identifiable cause, such as a humanoperator causing multiple sensors to be failed becauseof incorrect calibrations. Multiple components failingbecause of an earthquake occurrence is an example ofan external-event-initiated common cause failure. Theprobability of common cause failures occurring can beestimated by estimating the probability of the basicidentifiable cause - which occasionally can be reasona-bly done.

The Probabilistic Analysis Staff at the NRC is con-ducting research on dependent failure probability esti-mation using variations and extensions of the Marshall-Olkin shock model approach [9]. Component failuredata are also being collected from operating plants andcommon cause failures are being extracted and are be-ing analyzed [10].

As suggestions for further research, we think morework could be done to develop new (or old!) multivari-ate failure distributions based not on pure statisticalconsiderations but on engineering and failure causeconsiderations. For example, if distribution parameterswere related to engineering variables, then the analystmight be able to obtain reasonable estimates for theseparameters based on engineering considerations. As anadditional area, more work could be done to developgeneral dependency models (perhaps heuristic) whichtry to generalize and unify the results obtained fromspecific statistical estimations (extentions of the beta-factor model might fit here as might examinations ofutilizations of general mixture models*).

LIMITATIONS OF FAULT TREE/EVENTTREE MODELING

When performing a risk analysis, the analyst oftenmust estimate the probability of a complex system fail-ing, such as a nuclear safety system failing, or mustestimate the probability of a sequence of events occur-ring, such as several systems failing in some specified

T o r the "square root law," the dependent failure probabilityis estimated by taking the square root of the independent failureprobability times the probability for a single failure. Thisapproach has been cast off.

*ln a mixture model, the dependent failure probability isexpressed as the sum of terms, each term being the probabilityof a defined "environment" existing times the probability of themultiple components failing in that environment.

Page 186: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

181

order. Because data are generally not available to di-rectly estimate these compound event probabilities,fault trees or event trees, or some equivalent, are oftenutilized to express the event of interest (e.g., systemfailure) in terms of more basic events (e.g., componentfailures) for which some data are available.

Much has been written on fault trees and event trees[11-14], and we won't go into the properties or vir-tues of these kinds of analyses. Because of the LewisReport [15] and because of the impact the ReactorSafety Study has had on risk analyses, fault trees andevent trees are often incorrectly viewed as a panaceafor all problems in risk analyses. In exhorting faulttrees and event trees, the analysts and decision makerssometimes forget (or ignore) the limitations of thesetechniques. The limitations include the completenessproblem (How do we know that we included all eventsof importance?), the resolution problems (How do weknow that our events are defined to appropriate de-tail?), and the evaluation problems (Have we consid-ered all important contributors and all important de-pendencies, and have we quantified correctly?).

We would like to briefly discuss here the dichotomiz-ing problems associated with fault trees and eventtrees. When we use fault trees or event trees, then wemust divide the entire space of possible outcomes, forevery phenomenon and every component failure con-sidered, into two possibilities, "success" and "failure."If we are smarter, then we can partition the outcomesinto three or four disjoint possibilities (e.g., "completesuccess," "partial failure," and "complete failure");however, we now have the problem of how to quantifyand how to logically combine these different possi-bilities.

If we examine Licensee Event Reports (LERs) [16],which record failures and other events of potential risksignificance occurring at nuclear power plants, then wesee many instances of "continuous-like" occurrences.By continuous-like occurrences, we mean either of twokinds of events: (1) a failure or "partial" failure, suchas valve leakage, whose severity is representable by acontinuous random variable ("leakage") and (2) achain or sequence of multiple events, such as multiplehuman errors, where any one event is inconsequentialbut where the chain has serious consequences.

If we were to use a dichotomous success-failure ap-proach and attempt to estimate the probabilities ofthese two kinds of "continuous-like" events, then wewould face problems. For the first kind of event, wewould have to subjectively define the occurrence asbeing either a "success" or "failure" where our choice

would depend on the specific model and applicationwe had in mind. For the second kind of event, if weassume independence and simply multiply the proba-bilities for each basic event to obtain the probabilityfor the chain, then we would often get an incrediblylow probability estimate. Even if we assume some sortof dependency among the basic events, the resultingchain probability might still be very low and be unbe-lievable. To obtain a "credible" probability (say above1 X 10"4 per reactor year), we usually have to define avery gross event, of which the chain occurrence is amember, and we then estimate the probability for thegross event which ignores many of the details of theactual occurrence. ,

In our view, the most applicable statistical model forthese two kinds of LER events is one in which theevents are treated as realizations of some continuousrandom variable (or variables). Also, it is our view thatmultiple "partial" failures (e.g., multiple valves leaking)can have consequences as severe as a single "total" fail-ure (e.g., one valve rupturing) and, at the same time,can have an occurrence probability greater than theoccurrence probability for the total failure. In the Re-actor Safety Study, the whole "gray" area of partialcore melts (e.g., only some rods melting) was eitherignored or was lumped together with the total coremelt definition. This gray area of partial failures andpartial core melts, if investigated separately and withappropriate techniques, could give useful informationabout new risk contributors.

Partial core melts, partial failures, pipe cracks, andother events, which are best describable by continuousrandom variables, are very difficult to adequatelymodel using fault tree and event tree approaches.Therefore, we think that additional effort needs to bedevoted to developing probabilistic models and statisti-cal analyses that ur,e continuous random variable ap-proaches and multistate approaches to evaluate occur-rence probabilities associated with these phenomena.Some useful work has been done in this area [17—19];however, additional efforts could produce useful in-sights on sensitivities, data requirements, and potentialimpacts on calculated risks.

THE UTILIZATION OF SUBJECTIVE DATA

Many different kinds of daia are generally utilized ina risk analysis. For example, for a nuclear reactor riskanalysis such as performed in the Reactor SafetyStudy, the input data consist of estimates of compo-nent failure rates and average repair times, human error

Page 187: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

182

rates, reactor operating parameters for accident sce-nario development, and weather and population distri-butions - to name just some of the data.*

Depending on the sources and the means of estima-tion, the data are sometimes broken into two types —"hard" data and "subjective" data. Hard data refer toactuarial data statistically inferred from actual experi-ence. Hard data, for example, include estimates ofcomponent failure rates based on actual times of com-ponent failures. Subjective, or "soft," data consist ofestimates made subjectively by the analyst or decisionmaker or obtained by polling experts using, for exam-ple, Delphi techniques. If experts are asked to subjec-tively estimate component failure rates based on theirexperience, then these failure rate estimates would becalled subjective data.

The classificiation of data into "hard" and "soft" is,of course, rather arbitrary, because all data used in riskanalyses involve decisions and judgments made by theanalyst, and, in this sense, all data are subjective. Evenso-called hard data are often subjective extrapolationsof reported or measured data for application to theparticular problem at hand. Even though the hard vssoft data classification is crude, we think it does serveto point out the problem of having to handle data fromall different kinds of sources when performing an ac-tual risk analysis.

From our experience, subjective data in the form ofexperts' opinions can be a particular problem becauseof the many different ways of soliciting, handling, andinterpreting the data. Many risk analyses performed to-day use experts' opinions as part of their data and wefeel these risk analyses can still be useful even if theexperts' opinions are rather arbitrary in some cases.However, if we do not want to deceive ourselves andinterpret more into the results than are there, then wehave to carefully treat the subjective data and subse-quent risk results.

The careful treatment of subjective data involves (1)establishing procedures for soliciting experts' opinionsand obtaining consensus estimates if desired, (2) con-structing statistical models to describe the experts' vari-ability, (3) developing sensitivity and uncertainty anal-yses to describe the impacts of the subjective data onsubsequent results, and (4) establishing validation ap-proaches to check the accuracy and realism of the ex-perts' opinions. There have been some useful studiesdiscussing the treatment of expert opinion [20-23],and the Probabilistic Analysis Staff is using these and

*As used here, "data" thus means the input to a risk analysisrequired to produce numerical probabilities and consequences.

other studies in conducting surveys to obtain experts'opinions on human error rate values [24] and esti-mates of flood occurrence probabilities [25]. Morework, however, needs to be done in addressing thetreatment of subjective data, specifically consideringthe kinds of data used in risk analyses, the kinds ofevaluations performed in risk analyses, and the kinds ofutilizations in which the risk results are used.

UNCERTAINTIES OF "ABSOLUTE" AND"RELATIVE" EVALUATIONS

In risk analyses, two kinds of probability evaluationsare sometimes identified, "absolute" evaluations and"relative" evaluations. The terminology here may besomewhat colloquial; however, the concept is generallyapplicable. Absolute evaluations are those in whichsystem failure probabilities and accident probabilitiesare viewed as being the final results of the evaluations.Relative evaluations are those in which ratios (or differ-ences) are viewed as being the final results of the evalu-ations. (The ratios could be ratios of system failureprobabilities, ratios of accident probabilities, or ratiosof individual contributors to the total.)

Because of the Lewis Report, because of various pub-lished statements, and because "it seems right," there isa strong feeling, at least for nuclear risk assessments,that relative evaluations have less uncertainty than ab-solute evaluations. Along this same line, it is felt thatrelative evaluations can be carried out with only subjec-tive data (sometimes called engineering guesses) andstill be meaningful, although, to be meaningful, abso-lute evaluations require precise, hard data obtainedfrom actual history.

We certainly do not share this feeling regarding rela-tive vs absolute evaluations. It is one thing to say thatwe require rather gross data if we want to simply rankdifferent event probabilities without actually calculat-ing the ratios of the probabilities. It is another thing tojump from this ordinal vs cardinal number reasoning tothe conviction that relative evaluations have less uncer-tainty than absolute evaluations. Because a rankingdoes not actually involve probability numbers and be-cause we can get the same ranking for a iriety of datavalues particularly when the probabilities i ing rankeddiffer by orders of magnitude, we view the t ••. >t rank-ings as being one of the most robust results o. a riskanalysis.

For a risk analysis, if we could get away with simplycalculating rankings, then we could avoid most of theproblems discussed in this paper (and many problemsnot discussed). However, simply stating that one event

Page 188: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

183

or failure is more probable than another, without giv-ing actual probabilities or ratios, is not adequate formost risk analyses. Decision makers desire probabilitiesand consequences as results from risk analyses even ifthese results have uncertainties and may involve subjec-tively estimated data.

We will not dwell upon the subject of whether decis-ion makers expect too much from many risk analyses,even though we think they do. We will suggest thatuseful studies could be performed by utilizing variousuncertainty measures and comparing the calculated un-certainties associated with absolute evaluations withthose associated with relative evaluations. These studieswould help to answer the question of whether relativeevaluations really have less uncertainty as compared toabsolute evaluations. Characteristic data could be used,attempts could be made to obtain general conclusionsthat are related to the model structure, and modelinguncertainties could be analyzed as well as data uncer-tainties. To address output result resolutions, differentways of expressing risk analyses results could further-more be identified (giving rankings, giving the results interms of defined, discrete bins, giving interval esti-mates, etc.).

BAYESIAN VS CLASSICAL APPROACHES

Because there have been so many discussions and ar-guments among statisticians on the question of wheth-er Bayesian statistics or classical statistics is the "prop-er" statistics method, we dare not tread on sacredgrounds in this area. We will address the problems froma pragmatic point of view and hopefully not offendeither of the opposing sides.

The question of whether to use Bayesian statistics orclassical statistics arises in a risk analysis because theanalyst must decide how to treat his distribution pa-rameters - as either being constant Li the classical sta-tistics approach or being describable as random varia-bles in the Bayesian approach. This Bayesian vsclassical question, of course, is not unique to risk anal-yses but arises whenever estimates must be statisticallyinferred for parameters and 'unctions in a model toobtain numerical results from the model. Because ourdescription of the Bayesian vs classical choice is briefand rather crude, we will give an example; basic philo-sophical and theoretical discussions on Bayesian andclassical statistics are given in many statistical texts[26-28].

In a system reliability analysis in which we were re-cently involved, we used the classical statistics ap-proach and treated the true component failure rates

and true human error rates as being constants. Thecomponent failure rates and human error rates werethe unknowns in the evaluation of system reliability.Our data then consisted of point estimates and associ-ated interval estimates (confidence intervals) for thetrue failure rates and true human error rates. The pointestimate and interval estimate for the true system relia-bility, treated as a constant, was calculated in a classi-cal statistics fashion.

If we had used Bayesian statistics for our system reli-ability analyses, then we would have the same systemreliability model except that now our data would con-sist of a prior probability distribution for each failurerate or error rate describing our degree of belief forvarious values that the failure rate or error rate (treatedas a random variable) may have. If we had any historiesof component failure times or occurrences of humai.error, then we could update our prior distributions toposterior distributions incorporating these histories.Calculation of the probability distribution for thesystem reliability, now treated as a random variable,would proceed along the lines of a usual random-variable-distribution propagation problem.

We have used both Bayesian and classical statisticsfor risk analyses applications and have found both tobe useful. When we have used classical statistics, thenwe have often had to resort to sensitivity evaluationsinstead of formal confidence-bound determinations be-cause of the difficulty of propagating confidencebounds. In the same light, when we have used Bayesianstatistics, then we have always had to resort to usingdifferent prior distributions to attempt to determinethe impact of prior distribution assumptions on thefinal resulting distributions.

In using either Bayusian or classical statistics, we feelthe really important matter is to clearly understandand identify all assumptions and to perform sensitivitystudies on areas where knowledge is lacking and wherereasonable alternatives exist. To help the analyst betterunderstand the differences between Bayesian andclassical statistics and the things that really matter, wethink studies comparing Bayesian and classical statisticsapproaches in the context of a rr. <. nalysis would beworthwhile.

In these comparisons, the Bayesian and classical as-sumptions and viewpoints would be detailed as theyapply to a risk analysis. Specific problems would beselected and would be worked out in detail, and theresults would be compared and analyzed. All explana-tions would be aimed at the analyst, engineer, and de-cision maker and would be a key ingredient to thiswork. We feel these kinds of efforts would be more

Page 189: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

184

worthwhile than the constant feuding and bickeringthat appears to be going on between the Bayesians andclassicists within the statistical community.

THE USE OF RISK ANALYSIS INDECISION MAKING

The real payoff from a risk analysis comes when theresults are used in some way for decision making. Torealize this payoff, the analyst must decide how tocharacterize and explain the results so that they areunderstandable and usable by the decision maker. Theanalyst also has to often translate the decision maker'sproblem into one which is solvable using risk analysistechniques (or he has to decide it cannot be solvedusing these techniques!). Having performed the riskanalysis, the analyst then has to translate the risk re-sults back into the framework of the decision maker.

In the projects in which we have been involved, wehave had to translate calculated probabilities and con-sequences into proposed allowed downtimes for com-ponents, proposed administrative controls on opera-tors, and proposed system design changes. We have alsohad to translate general safety questions such as "is theprotection adequate enough" and "should we improveplant operating procedures" into defined problems in arisk analysis context, and we have had to explain thesolutions, and their limitations, to decision makers.

In the interactions between analyst and decisionmaker, we have encountered some problems involvingprobability and statistics that we think are worthy ofmore study. There are often large uncertainties calcula-ted for probability and consequence estimates; how tohandle these uncertainties in making recommendationsand in making decisions is an important and interestingproblem. (The various measures of uncertainties andthe various sources of uncertainties have been discussedearlier.) Oftentimes, quantified uncertainties are ig-nored in decision making, and practical guidelines andapproaches in incorporating uncertainties in decisionmaking should be developed. (In our opinion, obtusedecision-theoretic approaches are not what we call"practical.") Other important problems involve quanti-tative summary measures to use to characterize risk,criteria to use to compare two curves of probability vsconsequence (including uncertainties), and quantitativecriteria to use to aid decision making regarding accepta-bility of the calculated probabilities and consequences.

The Probabilistic Analysis Staff is presently carryingout a project to formulate numerical risk criteria to be

used in assessing the acceptability of calculated proba-bilities and consequences for existing and planned nu-clear power plants. Acceptability criteria for core meltprobability, for probability vs consequence curves, andfor expected loss (probability times consequence) areamong the criteria being considered.* Means of incor-porating uncertainties; accepted reliability models forevaluating system failure probabilities, accident proba-bilities, and radiological consequences; and accepteddata bases are being addressed in an attempt to "stand-ardize" calculational ipproaches. The ProbabilisticAnalysis Staff certainly welcomes input to this project.Letters are currently being prepared requesting assis-tance from individuals, groups, and professional organi-zations.

In the interactions between analyst and decisionmaker, we think the analyst and decision maker reallyneed to understand the statistical and modeling proper-ties and limitations associated with a particular set ofevaluations. To gain this understanding, we think itwould be useful to prepare risk analysis guidelines andrisk analysis primers that explicitly describe the variousanalysis approaches, their powers and limitations, andtheir areas of applicability. These risk analysis guide-lines and primers could be used by analyst and decisionmaker alike.

CONCLUSIONS AND RECOMMENDATIONS

In our somewhat rambling discussions, we have at-tempted to identify some of the probability and statis-tics problems that arise when risk analyses are at-tempted. The problems we have identified certainly arenot the only ones and the suggestions we have madeare not necessarily the best ones. Because of the prob-lems and, in our view, because of the great importanceof risk analysis to decision making, we believe thatthere needs to be closer involvement of analysts, -ngi-neers, and statisticians in attempting to develop an-swers and guidelines. It is time to stop the squabblesthat have existed in the risk analysis fielo, particularlyin nuclear risk analysis, and it is time ;o work togetherto solve the problems!

*As presently envisioned, if" (he calculated results are above theacceptability criteria, then they would be judged unacceptableand modifications would be required. If the results are below thecriteria, then the standard, "deterministric" reviews wouldproceed.

Page 190: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

185

REFERENCES

1. J. K. Shultis and F. A. Tillman, Bayesian Analysis of Component Failure Data, Report KSU 2662-8 for ContractNo. NRC-04-73-339, September 1978.

2. Letter from W. E. Vesely to V. R. R. Uppuluri, Subject "ORNL Trip of March 27, 1979, to CoordinateMethodology Activities," Apr. 27, 1979.

3. Project Status Report on Probabilistic Analysis Staff Activities, Research Project Control System, Office ofNuclear Regulatory Research, Washington, D.C.

4. U. S. Nuclear Regulatory Commission, Reactor Safety Study, an Assessment of Accident Risks in U.S. Commer-cial Nuclear Power Plants, WASH-1400 (NUREG-75/014), National Technical Information Service, Springfield,Va., October 1975.

5. K. N. Flemming, A Reliability Model for Common Mode Failures in Redundant Safety Systems, GeneralAtomic Report GA-13284, December 1974.

6. G. E. Apostolakis, "Effect of a Certain Class of Potential Common Mode Failures on the Reliability ofRedundant Systems," Nucl. Eng. Des. 36(1): 123-33 (January 1976).

7. G. T. Edwards and I. A. Watson, A Study of Common Mode Failures, UKAEA Report SRD-R-146, July 1979.8. R. G. Easterling, "Probabilistic Analysis of 'Common Mode' Failures," pp. X.7-1 to X.7-12 in Proceedings of the

ANSMay 8-19, 1978, Topical Meeting on Probabilistic Analysis of Nuclear Reactor Safety.9. W. E. Vesely, "Estimating Common Cause Failure Probabilities in Reliability and Risk Analyses," pp. 314-41

in Nuclear Systems Reliability Engineering and Risk Assessment, J. B. Fussell and G. R. Burdick, eds., SLAM,Philadelphia, 1977.

10. Licensee Event Report Analysis [Draft Reports on Pumps, Valves, Diesels, and Control Rods], EG&G Idaho,Inc., October 1979-October 1980.

11. J. B. Fussell, Fault Tree Analysis - Concepts and Techniques, NATO Advanced Study Institute on GenericTechniques of System Reliability Assessment, Liverpool, England, 1973.

12. R. E. Barlow, J. B. Fussell, and N. D. Singpurwalla, eds., Reliability and Fault Tree Analysis, SIAM, Phila-delphia, 1975.

13. H. E. Lambert, Fault Trees for Decision Making in Systems Analysis, Lawrence Livermore Laboratory ReportUCRL-51829, October 1975.

14. G. J. Powers et al., "Fault Tree Synthesis for Chemical Processes,'M/C//£"/ 20(2): 376-87 (1974).15. H. W. Lewis et al., Risk Assessment Review Group Report to the U.S. Nuclear Regulatory Commission,

NUREG/CR-0400, September 1978.16. Instructions for Preparation of Data Entry Sheets for Licensee Event Report, (LER) File, NUREG-0161, July

1977.17. J. Amesz, S. Garribba, and G. Volta, "Probabilistic Analysis of Accident Transients in Nuclear Power Plants,"

pp. 465-87 in Nuclear Systems Reliability Engineering and Risk Assessment, J. B. Fussell and G. R. Burdick,eds., SIAM, Philadelphia, 1977.

18. W. K. Brunot et al., "The Application of Probabilistic Techniques to Seismic Risk Analysis on the DiabloCanyon Plant," pp. XIV.5-1 to XIV.5-9 in Proceedings of the ANS May 8-10, 1978, Topical Meeting onProbabilistic Analysis of Nuclear Reactor Safety.

19. A. Amendola and G. Reina, "A New Probabilistic Approach to LMFBRCore Accident Sequence Evaluations,"presented at the International Meeting on Fast Reactor Safety Technology, August 19-23, 1979, Seattle, Wash.

20. N. Dalkey, Group Decision Theory, UCLA Report UCLA-ENG-7749, July 1977.21. M. A. DeGroot, "Reaching a Consensus,"/ Am. Stat. Assoc. 69: 118-21 (March 1974).22. S. J. Press, "Qualitative Controlled Feedback for Forming Group Judgments and Making Decisions," / . Am.

Stat. Assoc. 73: 526-35 (September 1978).23. L. R. Abramson, "Forming a Consensus from Subjective Probability Distributions," presented at the ORSA/

TIMS Joint National Meeting, Los Angeles, Nov. 15,1978.24. Letter from R. E. Hall to J. Fragola, Subject "1979 IEEE Workshop on Human Factors and Nuclear Safety,"

Oct. 1,1979.

Page 191: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

186

25. Letter from W. E. Vesely to L. G. Hulman, Subject "Flood Risk Analysis Program," Jan. 5, 1979.26. C. R. Rao, Linear Statistical Inference and Its Applications, 2nd ed., Wiley and Sons, New York, 1973.27. G. E. P. Box, G. C. Tiao, Bayesian Inference in Statistical Analysis, Addison-Wesley, Reading, Mass., 1973.28. M. G. Kendall and A. Stuart, The Advanced Theory of Statistics, Vols. 1,2, and 3, Hafner Publishing Co., New

York, 1973, 1961, 1966.

Page 192: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Risk Analysis and Reliability*

V. R. R. UppuluriUnion Carbide Corporation, Nuclear Division

Oak Ridge, Tennessee

ABSTRACT

There is an urgent need for the mathematical foundations of Risk Analysis. Several scientists areasking for unambiguous definitions of risk and for sound techniques to estimate these measures. Thereis considerable interest in the comparison of risks associated with alternate technologies. What can themathematics and statistics community of DOE do to alleviate this situation?

One of the important first steps seems to be the identification of mathematical and statisticalproblems in Risk Analysis. In this paper a few topics of interest are identified which require furtherdevelopment. These topics include (a) rare events, (b) series-parallel systems, and (c) treatment ofexpert opinion to obtain consensus estimates. Several other topics will be brought under discussion.

INTRODUCTION

Currently, considerable interest exists concerning themathematical foundations of risk analysis. Scientists areasking for unambiguous definitions of risk and soundtechniques to estimate these measures. There is an im-mense need in the comparison of risks associated withalternate technologies.

One of the pioneering efforts to apply risk assessmentmethodology by use of fault tree/event tree techniquesis the "Rasmussen Report" or Reactor Safety Study(RSS) [1 ] . The Risk Assessment Review Group Report[2], which is a critique of the RSS, points out that theRSS team did not choose a specific definition of riskbut displayed its results through graphs of the proba-bility of occurrence of an event, against the conse-quences of that event. After estimating the probabilityand consequences, the RSS compared estimated riskswith the probability and consequences of other societalrisks. The question is whether this approach is adequate.To answer this satisfactorily, one needs to understandthe underlying concepts more clearly and to developappropriate methodology. Lewis et al. [2] acknowl-

*Research sponsored by the Office of Basic Energy Sciences,U.S. Department of Energy, under Contract No. W-7405-eng-26with the Union Carbide Corporation.

edged the fact that the perception of risk is a very com-plex subject.

In "Probabilistic Methods in the Nuclear RegulatoryProcess," Levine [3] discusses the viewpoints and limi-tations on the application of risk assessment techniques.Levine hopes that the major contribution of risk assess-ment techniques in the nuclear regulatory processshould be in the form of background analyses that willaid in decision-making processes.

The ways risk analysis can be used with data analysisto obtain pertinent reliability and safety informationfor decision making have been discussed by Vesely [4),who defined risk analysis as follows: "Risk analysis isany quantitative analysis which determines safety impli-cations, reliability implications, and/or cost implicationsfrom basic failure data and other basic information."

In February 1979, the MITRE Corporation organizeda meeting on Risk Assessment and Govermnental Decis-ion Making. The Issue/Factbook [5] prepared for themeeting contains pros and cons of each issue and pointsto the need for the foundations of risk analysis. Arecent report by the National Academy of Sciences [6]discusses the risks associated with nuclear power. It issaid that an evaluation of these risks is based not onlyon the understanding of the technologies involved butalso on a diverse fund of basic knowledge in the phys-ical, biological, environmental, and engineering sciences.

187

Page 193: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

188

In this paper, we identify a few topics of interestwhere the mathematical and statistical community ofDOE can make contributions in the area of risk analysis.We point out the importance of having the same proba-bility space in order to compare different experiments,discuss consequences as random variables with infiniteexpectations, describe the phenomenon of rare events,discuss series-parallel systems and different kinds of ran-domness that could be imposed on such systems, andexplore the problem of consensus of estimates of expertopinion.

COMPARISON OF EXPERIMENTS

Let (£2, F, P) be a probability space, where £2 is anarbitrary set, F a collection of subsets of £2 which formsa sigma field, and P a probability measure. Usually,physical experiments are idealized by probabilityspaces. For instance, if we roll a six-faced die once, £2consists of the elementary outcomes: 1 dot, 2dots, . . . , 6 dots; F contains events such as an evennumber of dots or a prime number of dots, etc., and P isany desired probability measure. In this framework onecan find answers to several problems of interest whicharise when we roll a die once.

Suppose we wish to perform an experiment with a diewhich has 20 faces. The probability space associatedwith this experiment is different from the probabilityspace associated with the experiment described above.The comparison of these experiments is a problem ofdifferent magnitudes. The existing statistical methodsare restricted to a particular experiment and find an-swers to several questions.

CONSEQVENCE AS A RANDOM VARIABLE

Risk is commonly defined as the probability of anundesired event (e.g., see Rowe [7]). With this defini-tion, it is not clear whether one is thinking of an experi-ment that can be idealized by a probability space or oneis talking about probability as a measure of belief.

Given a probability space, one can talk about meas-urable functions (with domain £2), referred to as ran-dom variables. Real-valued random variables are func-tions that map the set £2 into real numbers. Theprobability measure P uniquely defines the cumulativedistribution function F(x) of the real-valued randomvariable X given by P[w: X(w) < x] = F(x).

In the context of risk analysis, events of low proba-bility and high consequence are of significance as op-posed to events of high probability and low conse-quences. If consequences can be measured by realnumbers, then consequences may be considered as real-

valued random variables defined on a probability space.Professor I. Kotlarski of Oklahoma State University sug-gests that consequences should be identified by non-negative random variables whose first moments are notfinite. This is based on the premise that

lim - F(x}]

where x denotes the consequence and 1 - F(x) denotesthe probability that the consequence is greater than x.Consideration of consequences whose expectations neednot be finite suggest new perspectives on the problemsin risk analysis. For instance, the percentiles of theworst possible consequence, characterized by the cumu-lative distribution of the maximum value of -Yj,X2, • • • > Xn, of n independent, identically distributedconsequences, give us information on catastrophes.These ideas need to be explored further in the contextof practical problems in risk analysis.

RARE EVENTS AND THE POISSON PROCESS

Events whose probabilities are very small are generallyconsidered to be rare events. Suppose a pack of 52 play-ing cards is shuffled well and a bridge hand of 13 cardsis dealt. There are exactly h = 635,013,559,600 dif-ferent hands that can appear; therefore, the probabilityof any specified set of 13 cards appearing in a hand isequal to I/A, a very small number,and the event may beconsidered as an improbable event. But every time ahand is dealt, one of the h possibilities is absolutelycertain to occur. Weaver [8] discusses this example andsuggests that smallness of probabilities is not enough todiscuss rare events.

The Organization for Economic Co-operation andDevelopment has a Nuclear Safety Division, which has aCommittee on the Safety of Nuclear Installations. Thiscommittee appointed a Task Force on Problems ofRare Events in the Reliability Analysis of Nuclear Plants(1976-1978). This task force studied the problem ofrare events thoroughly and concluded that the lowprobability events should be studied in conjunctionwith high consequences [ 9 ] .

At times, the Poisson process is referred to as thephenomenon associated with rare events. We shall notgo into the well-known derivation of the Poisson pro-cess based on the properties: (l)stationarity, (2) a pro-cess with independent increments, and (3) the impossi-bility of the occurrence of more than one event in asmall interval of time Af. These characteristics lead to adifference differential equation whose solution is thePoisson distribution. We shall now present a set of

Page 194: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

189

axioms, including an axiom on rarity, which lead to afunctional equation whose solution is the Poisson proc-ess. This result is due to Janossy, Renyi, and Aczel

Let us consider a counting process N(t), which de-notes the number of events observed during the timeinterval [0, t], and let pk(t) = P[N(t) = k].

Axiom 1. The process is homogeneous in time; that is,we assume that the probability that exactly k eventsoccur in the time interval (fj, r2) depends only on thelength of this interval t = t2 - r j . This is true for alltl < r2 and all k = 0 , 1 , 2,

Axiom 2. The process is Markov's type; that is, thenumber of events occurring during the interval (ti, t2)is independent of the number of events occurring dur-ing the interval (t3, r4) where ti < r2 < t$ < f4.

Axiom 3. The events are rare; that is,

ihmo -po(r)

= 1

or

to1 - P O ( O

In other words, as t -*• 0, the probability of one eventoccurring in the time interval (0, t) is asymptoticallyequal to the probability of at least one event occur-ring in the same interval.

Theorem (Janossy, Renyi, and Aczel): Axioms 1, 2, 3imply that

(XQ*exp(-Af)Jfc! . * - 0 . 1 . 2 .

Professor Roy Leipnik of the University of Californiaat Santa Barbara suggests several generalizations of thedefinition of rarity given here. For instance, one maydefine the process to be a-rare if

,-n

One may also define the process to have "unprece-dented events" whenever the time interval between twoevents has an infinite first moment. It is well knownthat

P[N(t) = k) =P[Wk+l > t] , k = 0, 1, 2, . . . ,

where Wk+l is the sum of *+ I independent and identi-cally distributed interarrival times. Thus the processwith

and

is said to be a process with unprecedented events. Thecharacterization of the associated counting processseems to be an open problem. The implications of thesegeneralized concepts of rarity and the concept of un-precedented events in the context of problems in riskanalysis have yet to be investigated.

SERIESPARALLEL SYSTEMS

A simple model which has several applications in relia-bility theory and allied topics is a series-parallelsystem. Consider a system with k + 1 subsystems, calledcut sets, Co, C|, . . . , Ck, arranged in a series. Thesystem fails if at least one of the cut sets fails. Let ussuppose that the cut set C, has n,- components arrangedin parallel, / = 0, 1 ,2, . . . , A. Let us assume that notwo cut sets have a component in common. Then anycut set fails to function only if all the components ofthe cut set fail. One can answer several questions, suchas the probability of failure of the system because ofthe failure of a specified cut set or the statistical proper-ties of the life length of the system, under differentassumptions. El-Neweihi, Proschan, and Sethuraman[11] assumed that, after t components have failed, eachof the remaining components are equally likely to fail,and the components fail one at a time. They foundanswers to some of the problems mentioned above.Sobel, Uppuiuri, and Frankowski [12] developed effi-cient recurrence relations and numerical algorithms anda set of tables useful in multinomial sampling problems.These tables give answers to the problems discussedabove when the number of components nit i = 0,1, . . . . A: are large. Feasibility studies were made to dosimilar computations in the case of hypergeometricsampling.

At times, common mode failures are studied using theseries-parallel system as a model (see, e.g., Abramson[13]). The components in a cut set C, are assumed tohave failure distributions, such as the exponential dis-tribution. The probability of the failure of the systemand the statistical properties of the waiting time for thefailure of the system are studied in such a model. There

Page 195: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

190

is a need to study the appropriateness of these assump-tions on the failure behavior of the components in thecontext of safety of large systems.

EXPERT OPINION AND SAATY'S METHOD

When there is a paucity of data, the temptation is touse the expert opinion that is available. When there areseveral estimates given by several experts, one has toobtain the consensus estimate and discuss the uncer-tainties. In 1977,Saaty [14] introduced a methodologyto study unstructured decision problems. This falls inthe category of Delphi techniques. Saaty's methodassigns weights to objects based on data obtained bypairwise comparison of objects. If there are k objectsand djj denotes the relative importance of object /compared to object /, then 0 < a/j = I lay. Such a matrixis called a "reciprocal matrix." From Perron-Froebiniustheory, it follows that the largest eigenvalue, Xm a x , of areciprocal matrix is real and the eigenvector (xj,^2 *k) associated with this Xmax has also realand positive components. The normalization wa =•*<*/(•*] + • • • + xk), a = 1, 2, . . . , k makes the

eigenvector (ivj, . . . , w^) unique, which is the weightvector associated with the k objects.

The author has reformulated the problem in terms ofthe logarithmic least squares and obtained estimates ofwa, given the reciprocal matrix. For a 3 X 3 matrix,these estimates are identical to the eigenvalue approach;for higher dimensional matrices, though the estimates ofwa are numerically different, they are of the same orderof magnitude. The author was pleasantly surprised tofind some of this material in a recent book by Mirkin[15], of the Soviet Academy of Sciences. Saaty [14]has applied this method to hierarchial problems. Oneneeds to investigate the usefulness of these methods inthe context of risk analysis and modify them suitably ifnecessary. Hitherto, people who used pairwise compari-son data were concerned about inconsistent triads andso on. The strong point in Saaty's approach is to acceptthe fact that hunan comparisons can lead to incon-sistent data and develop methodologies that lead tosensible conclusions. Several computer programs havebeen developed at Oak Ridge National Laboratory forthe analysis of paired comparison data, which is theinput for reciprocal matrices.

REFERENCES

1. U.S. Nuclear Regulatory Commission, Reactor Safety Study, an Assessment of Accident Risks in U.S.Commercial Nuclear Power Plants, WASH 1400 (NUREG-75/014), National Technical Information Services,Springfield, Va., October 1975.

2. H. W. Lewis et al., Risk Assessment Review Group Report to the United States Nuclear Regulatory Commission,NUREG/CR-0400, September 1978.

3. S. Levine, "Probabilistic Methods in the Nuclear Regulatory Process," pp. 1.1-1 —1.1-11 in Proceedings of theTopical Meeting on Probabilistic Analysis of Nuclear Reactor Safety, May 8-10, 1978, Newport Beach,California, American Nuclear Society ISBN: 0-89448-101-0,1978.

4. W. E. Vesely, "Failure Data and Risk Analysis," pp. VI. 1-1-VI. 1-15 in Proceedings of the Topical Meeting onProbabilistic Analysis of Nuclear Reactor Safety, May 8-10, 1978, Newport Beach, California, AmericanNuclear Society ISBN: 0-89448-101-0,1978.

5. MITRE, IssuejFactbook for the Symposium /Workshop on Nuclear and Nonnuclear Energy Systems: RiskAssessment and Governmental Decision Making, MITRE Corporation, McLean, Va., MTR-79W00038, February1979.

6. National Academy of Sciences, Risks Associated With Nuclear Power: A Critical Review of the Literature,National Academy of Sciences, Committee on Science and Public Policy, 1979.

7. W. D. Rowe, An Anatomy of Risk, Wiley-Interscience, New York, 1977.8. Warren Weaver, Lady Luck, Doubleday, Garden City, N.Y., 1963.9. OECD-NEA, Proceedings of a Meeting of the Task Force on Problems of Rare Events in the Reliability Analysis

of Nuclear Power Plants, Committee on the Safety of Nuclear Installations Report 10. Organization forEconomic Co-operation and Development-Nuclear Energy Agency, Paris, France, 1976.

10. L. Janossy, A. Renyi, and J. Aczel, "On Composed Poisson Distributions-I," Acta Mathematika AcademiaeScientiarumHungaricae 2: 83-98 (1950).

11. E. El-Neweihi, F. Proschan, and J. Sethuraman, "A Simple Model With Applications in Structural Reliability,Extinction of Species, Inventory Depletion and Urn Sampling," Adv. Appl. Prob. 10: 232-54 (1978).

Page 196: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

191

12. M. Sobel, V. R. R. Uppuluri, and K. Frankowski, "Dirichlet Distribution-Type I," in Selected Tables inMathematical Statistics, Vol. IV, American Mathematical Society, Providence, R.I., 1977.

13. L. R. Abramson, "Redundant Systems Subject to Simultaneous Failures," pp. 111-25 in Proceedings, 1978DOE Statistical Symposium, CONF-78I108, National Technical Information Service, Springfield, Va., 1978.

14. T. L. Saaty, "A Scaling Method for Priorities in Hierarchical Structures,"/. Math. Psych. 15: 234-81 (1977).15. B. G. Mirkin, Group Choice, Winston and Sons, Washington, D.C., 1979.

Page 197: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

192

WORKSHOP n - DISCUSSION

[Organizer's note: Draco lives! Although our inten-tion was to provide a verbatim record of the post-paper,question and answer session, too many unseemlyinfluences (e.g., unidentified speakers, recording flaws,time pressures) dictated that we reduce our expecta-tions. This section is organized around the questionsfrom the audience for which there exist questioner-prepared records and identifiable responders. The ques-tions are those taken directly from the "blue" sheets;the responses are paraphrased and summarized from thetranscripts when the responder could be (reasonably)identified. The responses have not been seen by thepanel members, so they all are absolved from sharingwhatever faults and fumbles may have been introducedby these draconian procedures.]

Lee Abramson, Nuclear Regulatory Commission. Inyour talk, you asserted that pc was not a good measureof risk if p is close to I or 0. However, if p * I, thenpc «* c, and this corresponds to your advice to assumethat the consequence c will occur. On the other hand, ifp * 0, you seem to say that c should be ignored andthat the risk should be evaluated only by p. Because pcdepends as much on c as on p, I do not understand this.Please elucidate.

Bill Rowe. The difficulty stems from relatively largeuncertainties associated with estimates of p whenever pis small (close to 0) or large (close to 1). Thus, unbiasedestimates of p are insufficient for decision-makingpurposes whenever events are "rare" or "almost cer-tain." Another mechanism is needed for dealing withthese situations.

Larry Bruckner, Los Alamos Scientific Laboratory. Ihave two questions/comments on the use of the Delphitechnique: How effective is it? How acceptable is it?

Bill Vesely. There are some very difficult problems inthe assessment of a Delphi approach. Efforts are underway to compare Delphi approaches with actualmeasured data. In particular, NRC (among others) issponsoring a get-together of some 40 experts in Decem-ber to work on the problem of human errors. Attemptswill be made to get some kind of consensus, butindividual appraisals will be maintained in order to get ahandle on the variability of expert assessments. TheDelphi approach is being used. For example, in theapplication of seismic phenomena, the problem is tounderstand the uncertainties and sensitivities and thepotential impacts of errors and differences among theexperts. These various aspects must be kept separated.That's where new developments and new modelingapproaches are needed.

Bill Rowe. The Delphi approach is a good techniquefor getting a consensus, a scientific value judgment.Anecdotal evidence exists in a number of fields, such asmedical diagnosis, that proves that various forms ofopinion are important. It's important that studies beundertaken, such as those described by Bill Vesely, tosee if the phenomena of interest actually are real andare measurable. An important question: How do youpick the right people to make these assessments?

Ram Uppuluri. The method introduced by Saaty tostudy unstructured decision problems [see reference inUppuluri's paper] is applicable to these matters, and itseems to be relatively robust against changes in thematrices used to summarize the experts' opinions.

Larry Bruckner. Even if the Delphi method is shownto be effective, it may not be used or accepted becauseof people's bias against it. It would require a great dealof training for a person to understand the method andto believe that it's correct. The critics, say, of thenuclear industry would just have a field day with it. Itseems, then, that what Rom Uppuluri is saying is thatSaaty's method may be more acceptable. Is thatcorrect?

Bill Vesely. The NRC is using the Delphi method nowto identify areas where further research is needed - forexample, in the seismic program. The Delphi method isnot used for licensing reactors ;t this stage, but it isused to identify factors, data, and models to find outwhere more research is needed and where more analysesare needed. Right now, it's step-by-step. There's nojump to decision making. If the Delphi method wereused exclusively for decision making, the critics wouldcertainly have a field day. As you know, people domake judgments and decisions subjectively and qualita-tively and in back rooms. What the Delphi approach,with its subjective estimates, does is to put theseestimates down, out in the open, for criticism andsensitivity studies. If there are criticisms that identifysome problems, they're out there to be analyzed andreviewed. In many cases, it's perhaps better than theway decisions are made now.

Dick Mensing, Lawrence Livermore Laboratory. Theuse of Delphi techniques, in a strict sense, leads to aconsensus that does not allow for the study of theeffect of the variability between experts, which seemsto reduce the effectiveness of this technique for use inrisk analysis. There was a fairly critical review of theDelphi technique in a Rand Corporation report of a fewyears ago.

Bill Vesely. If you carry the Delphi method through,you certainly do get a consensus, but you don't have toadhere to that "pure" form. The Individual estimates

Page 198: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

193

can be kept apart, and they are as important as theconsensus. Consensus is not sought necessarily in NRCwork - for example, the variability of the human isimportant in human factors work. An important ques-tion is: How can that affect safety risk?

Dick Mensing. Here's a comment on a point you madein your talk about phraseology. Here, Delphi techniquephraseology is being used - but that is not really what'smeant, in the strict technical sense. Thus, misunder-standings are created, and that's what you were refer-ring to.

A problem that exists in the use of expert opinions isthe incoherence of the individuals [i.e., inability tosatisfy the axiom P(A U B) = P(A) + P(B) if A and B aremutually exclusive]. Does Saaty's method overcome thedifficulty of incoherence?

Ram Uppuluri. You are right; Saaty's method isdirected at pair-wise comparison data. It really does nottake into account three-at-a-time and so on; so it is stillnot a very effective tool in that sense.

Dan Can, Pacific Northwest Laboratory. With regardto statisticians working in the area of risk analysis, Iwould like to raise a question of ethics. For elaborationpurposes, I would like to consider the areas of estab-lished science, best guessing, and witchcraft.

The boundaries between these areas are not clearlymarked; however, the earmarks of established scienceare consensus and replicability. I like to think oftraining in statistics as the epitome of training in the useof established scientific methodology. Certainly, statis-ticians' contributions towards the growth of science arebeing increasingly well recognized. However wisely, wesee methods we've developed being extensively used inother fields. Further, our success has allowed us toexpand into other fields. Clearly, we can see theexpansion of statisticians into the areas of biology,computer science, economics, and decision making. Partof this expansion is putting us outside the bounds ofestablished science. In many ways, I view this as apositive move. The area of best guessing is the cuttingedge of science. From one point of view, the best ofscience starts with best guessing. However, in manycases, best guessing has nothing to do with science at all.When statisticians move from the domain of establishedscience to the arena of best guessing, they bring alongthe trappings of scientific respectability. This creates anethical problem and poses a risk to our image asscientists. Consider the acceptance of low-dose extra-polation models by the Food and Drug Administrationor other regulatory agencies.

We hear arguments by statisticians about whichmodels are conservative and which models should be

accepted. Do we want models that are linear low dose?Do we want multiple hit models? The fact that statis-ticians are discussing these questions presents the illu-sion that established science is being dealt with. Asstatisticians, we recognize that for many cases it isimpossible to have enough data to establish safe dosesor choose between models. We recognize that theargument about which model to use is an argumentabout adopting guessing rules.

That is not the story being heard by policy makersand the general public. Exactly the same phenomenonoccurs when we build elaborate probability modelsupon the foundations of shaky subjective probabilities.I think it is ethically incumbent upon statisticians notonly to state what hat they are wearing but also tomake sure the message is heard. Because statementsabout the lack of an established, scientific foundationfor government policies are not welcome to the ears ofpolicy makers, making the message heard is nontrivial.

The consequences of failure to communicate includenot only damage to the image of statisticians butdamage to our society as best guesses prove false at rateshigher than promised and at costs that cannot truly beevaluated.

[Organizer's note: This question is deemed suffi-ciently important that only minor editing has beenperformed on the two responses in order to capture theflavor of the moment and to convey the nature of theresponders' insights.]

Bill Rowe. You picked a very important point;however, in your first case, I'm not even sure that theprobable consensus is part of science. What I thinkwe're concerned with in the scientific effort is what Icall positive empiricism. We take data, we analyze thedata, and we make hypotheses on how the system understudy behaves, and then we use the hypothesis topredict future action and see how well it predictsbehavior. We iterate until we get to some point that anumber of people agree is predictable enough, and wecall this an empirical law. But, as we get into thebehavioral sciences, we're down to the measurementlevels where it's very hard to get good measures. It'svery hard to find empirical laws, and it's very hard toget convergent predictions. So we have the problem ofdivergent interpretation.

Speculative inquiry, I believe, is part of science. Youpick an idea and theory and you see if this a reasonablehypothesis, and you try to see how well it predictsbehavior. You may reject it and try another one. Ibelieve this process is part of science. But, when itcomes to picking a particular position within a range ofspeculative or divergent interpretations - that's where I

Page 199: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

194

think the problem begins to leave the realm of scienceand enter the field of the politics of science. Here onemight pick a particular outcome to wave a flag to get alot of headlines, having picked the worst possible case ininterpretation. Or it may be that one picked thisparticular one because it agrees with some predeter-mined belief. It's a problem when we must deal withbelief situations. I suspect that the scientist has a role inthis area, but it's not the role of the scientist alone -it's the same role that exists for anyone who's involvr a.The point is that, when we get into this particular area,the scientist has to know when he is talking science andwhen he's talking something else. What I object to is thescientist who comes and says, "Based upon my scien-tific background, I am saying this," and expects extracredit for being a scientist.

Bill Vesely. 1 tend to agree with that, and I suggestthat statisticians always keep the decision maker in-formed of what are best guesses, what is actual data,and what is witchcraft. Sometimes decision makers losesight of these distinctions, and I think they need anobjective outside opinion to remind them quite often.

Roger Moore. Regarding the remark that "feuding"among statisticians destroys their effectiveness in thedecision-making process, would you elaborate?

Bill Vesely. The analyst/decision maker/engineer cer-tainly doesn't understand the fine points of Bayesian vsclassical issues, for example. The attitude and reactionis, "I'm not going to do any of that. They're arguingabout things that are unimportant or that I can'tunderstand, and therefore I'm going to guess or use myintuitive feeling about uncertainty and about what abest estimate is. I'm going to disregard all of thisformalism on statistics and uncertainties, and I'm goingto regard it as elegance." That word ["elegance"] isoften used in discussions and written documents tocharacterize statistical methodology. It's very wrongbecause too much is therefore disregarded - coding,documenting, recording what's been done, and pickingout what's important to sensitivity. When subjective,purely subjective, decision making is done and statisticsare disregarded, then there no longer exists a "handle"on what is data and what are best guesses and what areuncertainties and where more information is needed.That's the reaction that comes about because of thisfeuding. Because of this, statistics is viewed as anelegant science that is not very practical and not neededfor decision making.

Lynn Schaeffer, BDM Corporation. Many persons areconcerned about the apparently arbitrary way ofestablishing rather restrictive standards for allowableradiation doses resulting from releases of radioactivity

from nuclear power plants. In a recent publicationwhich I coauthored on the uncertainty in the predictionof nuclear dose due to 131I2 transported via thepasture-cow-milk pathway {Nuclear Technology, mid-August, 1979, p. 99), the suggestion is made that aprobabilistic approach be taken in assuring compliancewith regulatory standards. Is any consideration beinggiven by the NRC to adopting a probabilistic approachin establishing or assuring compliance with regulatorystandards?

Bill Vesely. NRC, in the past, has done some of this -let's say, behind closed doors - and has come out andmade a decision and said, in effect, "We are not going touse these probabilistic criteria." But, just in the past fewmonths, a project has been started in which NRC isgoing to attempt to formulate - partly because of theinsistence of the Congress and the NRC AdvisoryCommittee on Reactor Safeguards — actual numericalcriteria. I believe that in the beginning these criteria willprobably be applied only to hardware with independentfailures. Later, such contributions as human errors willbe incorporated. The goal is to have numerical standardsformulated for further review in three years which willapply to reactors only. The complete fuel cycle will beconsiderably more complicated. To apply these criteria,the applicant would be expected to go through acalculation and then compare his upper 95% confidencebound with, say, NRC's criterion. As seen right now,these will be looked on as "add-on" criteria, notreplacement criteria, at least until more experience isgained. The goal is to come out with some actual initialnumerical criteria after one year, as well as guidelines,models, and data to be used in trying to satisfy thecriteria. And the American Statistical Association isbeing asked to form a group, whatever is in their ability,to help and to consult in this formulation of criteria andapproaches. Statisticians will be getting more into theregulatory process at NRC. It's coming; it's going onright now. How these criteria will work out is somethingelse, but they are being formulated now.

POSTSCRIPT

Three items appearing since the Symposium ad-journed should have some interest to participants:

1. The papers by Uppuluri and by Moore referred to aMITRE-sponsored symposium/workshop. Reference:Symposium/Workshop on Nuclear and NomuclearEnergy Systems: Risk Assessment and GovernmentalDecision Making, Proceedings, MTR-79W00335,MITRE Corporation, McLean, Va., September 1979.

Page 200: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

195

2. A popularized discussion of risk appeared. Refer-ence: Epps and Garrett, "They Bet Your Life," TheWashington Post Magazine, November 11, 1979, pp.38-46. Some tidbits: "The new science of riskassessment may be an economic boon or hazardousto your health." "Risk assessors tell us what ourchances are in the race between innovation andcatastrophe." "An inadvertent launch will spoil yourday." "A risk is acceptable when those involved areno longer apprehensive about it." "You have got todifferentiate between risk assessment and reality.""Sometimes it is better to curse the darkness.""Scientists have to have a little humanity."

3. The precept that an educated society is an under-standing society keeps popping up. The followingitem, although specifically dealing with improvingthe understanding of radioactivity, contains theseeds of issues closely allied with risk:

Thirty years after Hiroshima and more than 80 yearsafter the discovery of radioactivity, the vast majority ofAmericans do not have the vaguest idea what it is. Eventhe common units in which radioactivity is measured -curies, rads, and rems - are as meaningless to people asthe most impenetrable gibberish of advanced mathe-matics. This is a form of ignorance that is risky to livewith.

It also produces fear of the unknown. During Three MileIsland, a tiny amount of radioactivity was discovered inthe milk at neighborhood dairies. This was duly reportedby the media - 12 picocuries per liter found in local milk- and generated cries of panic. But wait. Picocuries? Whatin the hell is a picocurie? In fact, the amount ofradioactivity discovered is several thousand times less thanan amount that would have justified precautionary mea-sures. But how could you know? And who could trust a

government that in the past had - perhaps knowingly -allowed members of its armed forces and citizens of Utahto be exposed to clearly dangerous levels of radioactivity?

Even if the nuclear industry were dismantled tomorrow,radioactivity would still be inescapable. It comes at us ascosmic rays and occurs naturally in radioactive substancesin the earth and in all living matter. Trying to convince acongressional committee that certain standards for thedisposal of nuclear waste are too strict, Nobel laureateRosalyn Yalow pointed out the other day that theamounts of radioactive potassium and carbon present in anormal human being would require that person — if he orshe were a dead laboratory animal - to be disposed of asnuclear waste.

The most important source of radioactivity - and hereis where ignorance breeds risk - is medical and dentalwork. As much as 99 percent of all normal exposureabove the natural background comes from medicine.X-rays are the most common source, and the most grosslyoverused. Though most medical exposures are low doses,the thing to remember is that the effects of low levels ofradiation are simply unknown. The only safe rule tofollow in the face of this uncertainty is that anyunnecessary exposure is unwise.

Modern medicine is here to stay and so, apparently, arenuclear reactors. Society could live with them a lot morecomfortably, and more safely, if Americans had a basicknowledge of what radioactivity is, what its propertiesare, which parts of the body are most sensitive, whatannual doses are thought to be safe, and what amounts areknown to be dangerous. Providing that knowledge is theresponsibility of the schools, the federal government andthe scientific establishment, and none has done the jobvery well. Whether people will care enough to learn itremains to be seen.

Reference: "Radioactive Ignorance," Editorial, TheWashington Post, December 17, 1979, p. A14.

Page 201: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Workshop IIIAnalysis of Large Data Sets

Wesley L. Nicholson, Organizer

INTRODUCTION

! am Wes Nicholson, a statistician at DOE's Pacific Northwest Laboratory. This is the workshop on the analysisof large data sets. To lead off, I'd like to comment on the question of why we are having a workshop on the analysisof large data sets. Several years ago, Jim Pool, Director of the Applied Mathematical Sciences (AMS) ResearchProgram for DOE, initiated a dialogue with the DOE laboratory and university contractors that he supported on thisprogram. The idea was to evaluate the research sponsored by the AMS program. Was this research appropriate forsuch sponsorship? Was it relevant to DOE's need? And, a rather pragmatic question, could it be defended and sold tothe budgetary people in Washington? As a result of this dialogue, specific research areas were selected in the varioussubdisciplines of applied mathematics. Focus on a few areas would mean that the combined mathematical talentwithin the DOE laboratories would be able to turn out a significant product with immediate application toimportant DOE programs. In statistics, three of these areas were model evaluation, risk assessment, and the analysisof large data sets. At this time, it is not clear as to what the level of research will be in each of these areas. Onefunction of the workshops at this symposium is to clarify that question.

At Pacific Northwest Laboratory, the AMS program is centered around the analysis of large data sets. We callthis program ALDS. We do not feel that our project per se is going to solve all of the data analysis problems faced byDOE. On the other hand, we do feel that our project can provide a focus for the scientific community.

To start the workshop, I would like to make some comments about the need for the analysis of large data setswithin DOE. Historically, the AEC, then ERDA, and now DOE has been involved in many activities that haveproduced masses of data. These activities fall naturally into several program areas. The first of these is the area ofmonitoring programs. At every facility involved in the process of transforming natural uranium into weapons-gradematerial, there has been a continuing progTam of inventorying fissionable materials. Now, we are concerned with thesafeguarding question for fissionable materials. Certainly that mass of inventory data contains information about theplausibility of safeguards. At each of these facilities, there has been a human dosimetry monitoring program. Now,we are beginning to think about health effects. PNL's Ethel Gilbert has been involved in the health effects problemand has analyzed the Hanford Mortality Data. She reported on part of that analysis at the Second DOE StatisticalSymposium in 1976. There has been environmental monitoring in the neighborhood of each of these facilities andalso on a worldwide basis. We are beginning to think about alternative ways of producing energy. It becomesimportant to evaluate the effect on the environment of the atomic energy business as opposed to other forms ofproducing energy.

In addition to monitoring programs, there have been long-term experimental programs at research sites withinthe DOE community. For example, there are the biology programs to evaluate the radiation effects to animals andto ecological systems. There are cooperative experiments going on within DOE. For example, at several DOElaboratories, researchers are looking at the effects to the lung of large animals, say beagle dogs, from the uptake ofplutonium dust. There are metallurgy programs looking at the radiation effects to materials - that is, what happensto the structural integrity of metal exposed to a radiation environment for a long period of time. Materials scientistscomment on the need for putting together a large data base of all such experimental results that have gone on not

197

Page 202: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

198

only in this country but all over the world. There have been a large number of dispersion experiments to determinethe extent of contamination from radiological disaster. Here a tracer material is used to simulate the radioactive one.In each of these instances of many similar experiments, the experimental results have not been looked at in greatdetail as a common body of information. The chances are that they will not be looked at until the looking-at processis made a lot more palatable than it currently is for the data analyst.

In addition to these historical programs, there is a current need for the analysis of large data sets. A particularlyimportant example to DOE is the need of the Energy Information Administration, the arm of DOE that is chargedwith determining the country's energy needs, categorizing our natural energy resources, and quantifying our use anddepletion of those resources. In fact., we have the whole area of estimating energy resources as a workshop at thissymposium. A particularly important problem to the Energy Information Administration surrounds the use ofcomplicated computer models of the nation's energy economy. These models involve literally thousands of inputvariables.

The area of real-time experimentation or real-time monitoring of natural phenomena is one where data arecreated at almost mind-boggling speeds. In some situations, data are flowing by so fast that we are hard pressed toeven record it. Possibly just browsing it and remembering the salient features is the best way to handle the situation.The high-energy physics community has done an excellent job in real-time condensing of data in their experiments.This data analysis expertise has come at a tremendous cost. We like to think that we can reproduce that kind ofability for data analysis on a more general scale without going through the tremendous cost for each application.

Scientific experimentation today is so complicated and diverse that there are many scientific institutionsinvolved. Now, we have the problem of linking together the results. We have to consider distributed computernetworks so we can bring data together at a common site for computer analysis.

Our society tends to create large data sets because we are very conscious of causes and effects, health effects,environmental effects, etc. We initiate more and more retrospective studies to identify causes. Usually we areconsidering marginal effects. So we need a continuum of situations, constituting a very large data set, to identifythese effects. I'm sure that you in the audience can come up with other examples. The important point is that thereis a definite need for automated capability for looking at large data sets. In particular, the DOE has many situationswhere such a capability needs to be applied to understand the information content of large data sets.

In addition to need, there is clearly a strong interest in the statistical community. Several years ago at Dallas, theInstitute of Mathematical Statistics held its first special topics conference. The topic of the conference was theanalysis of large data sets. Those of us who attended that meeting came away with the feeling that the statisticalcommunity had identified an important problem; that in specific situations people had done an excellent job ofanalyzing large data sets; but that in the finaJ analysis we had not even a semblance of a general methodology.Clearly, much work needed to be done in the area. Thus, if DOE statisticians work in this particular area, they canexpect to get a lot of help from other portions of the scientific community.

We wish to accomplish several things in this workshop. First, we want to describe to you the large data setproject at Pacific Northwest Laboratory. Second, we want to entice you to think about large and how it affects thedata analysis process. Third, we want to describe some specific areas of research which would materially improve ourability to analyze large data sets.

The analysis of large data sets project at the Pacific Northwest Laboratory has a very simple goal, which is todevelop an in-house DOE capability for the analysis of large data sets and to apply that capability in DOE's bestinterest. The capability is, as much as possible, to be transferable to other DOE laboratories. For the past year, wehave been asking ourselves some fundamental questions: What is a large data set? What is common to large data setanalysis problems? What statistical tools are needed in most large data set analyses? Are the classical tools sufficient?What computer hardware is needed to handle large data set analysis? Is this hardware affordable to the commonman? Does all the software exist to analyze large data sets? Or do we have to develop additional software? Dan Carrand Bob Burnett will touch on these issues as they describe our project.

Leo Breiman is an established statistical consultant. He has actually analyzed a number of large data sets. As aresult, he is in an excellent position to evaluate the status of our statistical methodology with respect to large dataset analysis. His talk is a sharing of his insight on this problem. Those of you who are interested in doing research onstatistical methodology for the analysis of large data sets should pay particular note to his comments. Finally, DickBeckman of the Los Alamos Scientific Laboratory statistical section will talk about their experience in the analysisof large data sets. Here is an illustration of the current status of large data analysis at a DOE laboratory.

Page 203: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

199

Before I introduce the speakers, I wish to make one final comment about our analysis of large data sets project.Veiy early in our thinking about such a project and how it ought to be conducted, the statisticians involved madethe decision that the only way to make progress in this area in an economical manner both from the standpoint ofthe statistician's time and money involved was to involve both statisticians and computer scientists on an equal basis.This equality of disciplines and financial support is reflected today in the fact that our first speaker, Dan Carr, is astatistician and our second speaker, Bob Burnett, is a computer scientist.

Page 204: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

The Many Facets of LargeDaniel B. Carr

Pacific Northwest LaboratoryRichland, Washington,

INTRODUCTION

Over the last year, I have talked to many people aboutthe analysis of large data sets. One of the first questionsI am asked is "What do you mean by large?" This ques-tion is raised not because people do not believe there issuch a thing as large data sets but because people want asimple and precise definition. The definition we havegiven has been short: "The data set is large when thesize of the data is the major factor in preventing theconvenient and effective analysis of the entire data set."This definition, while less than satisfactory, clearly sug-gests that large is not simply defined and that a trueunderstanding of "large" develops after ? dually analyz-ing large data sets.

In the rest of the talk, I am going to characterize largefrom four general frameworks. The first general frame-work deals with the computing problems and character-istics encountered in analyzing large data sets. Thesecond general framework presents the way a statisti-cian might conceptualize large data sets and presentssome of the approaches he might propose for dealingwith the problems encountered. The third generalframework has to do with a new concept of efficiencythat is engendered by the analysis of large data sets. Thefinal frarrework has to do with the new possibilitiesthat go hand in hand with large data sets.

COMPUTING PROBLEMS ANDCHARACTERISTICS

The first general framework is computing problemsand characteristics. I've broken this down into fiveareas: storage requirements, data access difficulties,type of processing involved, data base maintenance, andprogramming considerations. In asking statisticians whatthey think large is, a frequent response deals with thestorage requirements for the data sets. The common

response runs something like this. If the data will fit incore, it is a small problem. If the data will fit on onedisk, it is a ircdium problem. If multiple storage devicesare required, it is a large problem. Although the distinc-tions here are a little fuzzy because of various sizes ofdisks and what's considered "in core" (for example withpaging computers), a general concept is being suggested.That concept is physical access speed. If physically ac-cessing the whole data base is awkward or takes a longtime, then the data set is large.

The second computing area involves logical data ac-cess. With large data sets, huge data dictionaries (direc-tories) may be required to provide logical access. De-scribing the data, where it's stored, and its history canrequire a massive documentation effort. Documentationis usually far from complete. Frequently, the best docu-mentation is actually in the minds of personnel associ-ated with data sets. These personnel may be very awareof both data location and its quality. Using personnel toprovide logical access to the data may not be that bad,but certainly it's disastrous when such personnel be-come disassociated with data sets, for whatever reason.In any case, logical data access difficulties frequentlyaccompany large data sets.

Another way to characterize large is in terms of thetype of computer usage. Consider the contrast betweendata analysis and number crunching, where numbercrunching includes such procedures as Monte Carlosimulations, the solving of partial differential equationmodels, and linear programming. In terms of input andstorage requirements, the demands made by data analy-sis are much greater. In terms of computation, the de-mands of data analysis are relatively small. In terms ofoutput (with all visual displays, plots, and summaries ofthe data that we like to look at), the data analysis de-mands are again somewhat greater. Thus, large data setanalysis can be distinguished from number crunching bythe relative emphasis on input/output operations.

201

Page 205: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

202

The fourth area is data base maintenance. With largedata sets, the need for automated editing becomes ap-parent. The automated detection of bad data and theautomated replacement of bad and missing data be-come essential if serious progress is to be made analyz-ing many large data sets. Another type of data basemaintenance problem is the limited storage available formultiple copies. In many situations it may be advanta-geous to store the transactions to be applied against amaster file rather than to update continually the mastertile. Updating and the sequence of updating may haveto be carefully controlled. Data base maintenance prob-lems are characteristic of large data sets.

The fifth general area involves programming con-siderations and the need for interaction between com-puter scientists and statisticians. Statistical conceptssuch as sampling may provide a powerful way to reduceprocessing time. However, accepting or rejecting data aseach case flows by may be a very slow method. Thecomputer scientists may be able to obtain random sam-ples by using random disk addresses. Thus, a combina-tion of ideas from the disciplines may provide optimalmethods.

Sorting problems provide another illustration. Whenthe number of cases is small, who cares how statisticianssort? As the number of cases gets larger, we need to usethe best techniques computer science has to offer. How-ever, even that may not be good enough, and statisticalalternatives to sorting may have to be considered. If, forexample, we're after order statistics, a generalization tothe secretary problem may provide reasonable esti-mates. Thus, it is not always clear which discipline willhave strongest claim to the most recent "solution."

Currently, few statisticians have been trained in care-ful algorithm design and in detailed understanding ofthe computer being used. On larger problems, these con-siderations can be of major importance. As a simpleexample, consider the multiplication of very large ma-trices on modern paging computers. One method ofmultiplication can lead to thrashing as the computeraccesses the various elements in a matrix that spans pageboundaries. This can be disastrously expensive. Thus,with large data sets, we need the skills of the computerscientists.

The last area I want to mention explicitly, the displayof information, provides a variety of interesting prob-lems, ranging from nuisance problems, like how to pickgood scales on the fly and how to avoid overplottine, todeep human perception problems, like how to commu-nicate effectively in more than two dimensions. Largedata set analysis requires the attention of specialists out-side statistics and provides a host of problems whose

solutions will require the cooperation of statisticiansand computer scientists.

STATISTICAL CONCEPTS OF LARGE ANDSOME APPARENT APPROACHES

A second perspective of large comes fre «n consideringthe way statisticians think about large and from lookingat the initial ideas they have for tackling the problemsthey see. When a statistician thinks of large, he typicallythinks of (1) a large number of cases, (2) a large numberof variables, and (3) a complex logical structure amongthe variables and cases. When we think about a largenumber of cases, sampling immediately comes to mind.Both classical and sequential methods are applicable.When we think about a large number of variables, wetypically split the variables into dependent and inde-pendent variables. For both types of variables, the im-mediate reaction is to reduce their number. The mostcommon practice is to ignore variables that are notthought to be interesting or important. Beyond this, fordependent variables, we often think of dimension-reduc-tion techniques, such as principal components and fac-tor analysis. For a large number of independentvariables, we usually think of model approximation toreduce the number of parameters. When we think aboutdata set complexity, however, not so many establishedideas come to mind.

I will try to characterize data complexity in terms ofthree general areas. One view of data complexity cansimply be described as a lack of uniformity. For exam-pie, when measurements are collected over time, themeasurement procedure or measurement devices fre-quently change. As a specific example, the film badge Iwear to measure my exposure to radiation is not thesame device used 1S years ago. Studies related to Han-ford radiation exposure need to consider this factor.Another way measurements change over time is in therecording procedure. Both situations are examples ofpotential heterogeneity that can be related to known orreadily discovered factors. Lack of uniformity or lack ofhomogeneity may also be a purely empirical relation-ship among multiple variables. This lack of uniformityleads directly into the conflicts between accuracy andparsimony of description that typify large data set anal-ysis.

A second view of data complexity involves a data basespanning different but related data sets. If we look at allinhalation experiments in the DOE labs, it would seemthat the current approach is to analyze each set in isola-tion or at best in terms of the unqualified intuitions ofthe biologists. When data sets are quite similar, we

Page 206: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

203

should be able to use methods like empirical Bayes toextract more information. When there are some sub-stantial differences (such as the differences betweenspecies), it is much more difficult to exploit the simi-larities that 3T» observed. However, it would seem that adeep understanding of the processes involved wouldmake this possible. We have a real need for integratedmethodology here. Both of these characterizations ofcomplex data raise the question, "How can we analyzethe data base as a whole when there is substantial heter-ogeneity between subsets of the data."

The third aspect of complexity involves nontrivial log-ical structures. These include vector value relationships,hierarchial structures, and relational networks. All theserelationships may apply to the same data set as thestatistician adopts different viewpoints. Suppose we hadthe total defense expenditures listed for each county inthe United States. We might view this data as hierarchialin nature and want to summarize upward through struc-ture, thus obtaining state totals. If we then look at thesesta • totals in relationship to other states, we are reallylooking at a relational network. We may also want tolook it these state totals in relationship tc the years ofsenijiin >f its two senators, which gives us a bivariateordering of values. We may want to look at the state'sdefense "income" in relationship to committee mem-berships of the senators. This view might be exceedinglycomplex. Clearly, analysis of data sets can involve manydifferent perspectives, and the logical relationships as-sociated with these perspectives can be nontrivial. Whilecomplexity is not unique to large data sets, it stands outbecause it compounds other problems.

A NEW CONCEPT OF EFFICIENCY

Now, I would like to raise another aspect of large dataset analysis. Imagine that you're a data analyst takingon a new project. When you come to work one morn-ing, on your office floor you find boxes containing 60reels of tape, and on your desk you find a stack ofpaper containing about 20 sheets of paper documentingthe variable names and their locations on the tapes. Letyour mind leap forward in time as you think of themany conversations you're going to have. There's theconversation with the computer scientist as you try tofind out why you can't read tape number 23, the re-peated calls to your client about why tape 25 variablesdon't correspond to what's documented, and the daysand days of conversations with the people who knowthe data and the dynamics involved. Think of your ef-fort to discover how the data were collected - to bringout unanticipated relationships in the data. Imagine theweight of the computer output that you will carry to

your office. Think of the frustration you experience asyou toss it away because what you have been told isincorrect - or the frustration you feel because thewhole data set should be thrown away. As you think ofall these things and more, you begin to realize howawesome large data set analysis can be and how awe-some it is in terms of statistician lifetime.

When data sets are small, we think statistical effi-ciency is important. Using assumptions and models, wetry to get every last drop of blood out of the data. Asthe data set gets larger, computing efficiency becomesimportant. Computing costs can be prohibitive, so wewant to get a high amount of information for theamount of computing money spent. This same principalalso applies to statistician lifetime. The time required toanalyze large data sets forces us to think of the totalcost of analysis and statistician convenience. If we're todig canals, we need earth-moving machinery and notsharper shovels.

NEW POSSIBILITIES

With all this discussion of problems, difficulties, andcosts associated with large data sets, it's refreshing toconsider the P2W possibilities that are available to uswith large data sets. For example, it's been suggestedthat we could use sampling procedures on these largedata bases. When we get our hands on large sets ofhomogeneous data, we have the possibility of obtainingfixed-width confidence intervals and tests of specifiedpower. Certainty, we can think of new uses for crossvalidation. We can get good estimates of distributions.We can move away from our assumption making andrely more heavily on nonparametric procedures. Fur-ther, we have the opportunity to extensively investigatetail distributions. If we look at, say, the upper 2.3% ofthe data on the basis of one variable, we may haveenough cases left to do a substantial analysis. Finally,there are lots of possibilities in terms of display. Someideas about display are going to be presented in the nexttalk. However, there is just one experience I want to tellyou about.

Recently, we put up a data base of seismic data. Thisdata base included the epicenters of the quakes ir x, y,z coordinates for the whole globe. We put this data upon a color raster device with two pictures, one red andone green, offset 5 degrees. Just as with the old 3-Dcomic books, we are able to look at this display with ared and green lens and see in three dimensions. Theexciting part for me occurred when we looked at Japan.I could clearly see the Asian and Pacific plates. Thatview made the tectonic plate theory come alive. At thesame time, I was imagining looking at three-dimensional

Page 207: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

204

data as if I were a future-day astronaut piloting his wayto different galaxies: I was seeing one more way to get afeel for three dimensions. Much work needs to be donein terms of speeding up graphical devices and under-standing how the human mind processes visual informa-tion. However, even though our capabilities are rudi-mentary, the door to the third dimension is open rightnow.

In conclusion, there are many facets to large data setanalysis. We've discussed computing problems and char-

acteristics, a statistician's conceptualization of large, theneed for statistician convenience and some of the op-portunities provided,by large data sets. Through thischaracterization of large, I have hinted at a host ofproblems that need to be solved. A few may prove un-important and a few may be insorvable, but I believe,with the help of creative minds throughout the DOEkos and the university community, that major, excitingprogress win be made in our abilities to analyze andpresent information from large data sets.

Page 208: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

The Analysis of Large Data Sets Project—Computer ScienceResearch Areas*

Bob BurnettPacific Northwest Laboratory

Richland, Washington

ABSTRACT

The Analysis of Large Data Sets project at Pacific Northwest Laboratory provides the focus for ahigh degree of interaction among the statisticians and computer scientists involved. This paper ad-dresses some of the areas of computer science research and software development required to createand support a large data set analysis capability. The data analyst must be able to effectively initiate,monitor, and control the data analysis process via a convenient, easily learned method of communica-tion with the computational resource. An "obvious" user interface is being developed to meet thisneed. A self-describing data file structure and associated data management modules will provide acommon data interface among diverse software packages. In addition, the self-contained data descrip-tions and compact data representations allow "complete" characterization of the data items andprovide for economy of data storage and rapid data access. Advanced statistical graphics displays andinteractive graphical input techniques will provide the data analyst with a powerful tool with which tovisualize complex data relationships. Areas of cooperation and interaction with other research labora-tories and universities are also discussed.

INTRODUCTION

The Analysis of Large Data Sets (ALDS) projectprovides an exciting opportunity for close interactionamong the statisticians and computer scientists atPacific Northwest Laboratory (PNL) and for coopera-tion and interaction with researchers elsewhere in thescientific community. The computer scientists who areworking on the ALDS project are dedicated to providingthe statisticians with a computational laboratory whichis especially tailored to large data set analysis, to theextent that many of the obstacles and operationaldifficulties usually associated with "large" data sets canbe at least partially alleviated. That is our "bottomline" goal. In reaching this goal, however, some veryinteresting and challenging computer science researchand development tasks need to be undertaken.

*Work supported by the U S. Department of Energy underContract No. EY-76-C-06-1830.

Our motivation is driven by two basic questions. First,what are the hardware requirements of a computersystem for large data set analysis? We feel that the data-directed, iterative, open-ended nature of data analysis ingeneral (and large data set analysis in particular) re-quires a decentralized computational resource which isdesigned primarily for interactive computing, which canbe configured for fast input/output, and which is easilyinterfaced to graphical output and input devices. Atpresent, such requirements can be most effectively metby a high-performance virtual memory minicomputer.Such a resource has the additional desirable propertythat it is affordable by a large community of potentialusers. We also feel that recent advancements in high-speed, high-resolution color raster graphics technologyhave created the potential for such a device to be anextremely valuable graphical visualization tool for dis-playing data and analysis results. Additionally, we areplanning to conduct research into the uses of micro-processors attached *o the main computational resource

205

Page 209: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

206

to perform special distributed functions which can sig-nificantly increase the overall speed and efficiency ofthe system.

Secondly, what are the software requirements interms of statistical algorithms, data management, graph-ics, user interfaces, and overall system integration forsupporting large data set analysis? Some possible an-swers to this question will be discussed in the remainderof this paper. Areas that will be covered include an"obvious" user interface, a self-describing data filestructure, a common method of interfacing diverse soft-ware packages, a set of "kernel" data manipulationfunctions, statistical graphics, and microprocessor-basedexternal function implementation. It should be notedthat we are placing a particular emphasis on softwarefor interactive data analysis.

"OBVIOUS" USER INTERFACE

The user interface is the window through which thedata analyst views and invokes the computational toolsat his disposal. What do we mean by an "obvious" userinterface? We mean that at all times during the inter-active data analysis process, the statistician should knowwhat the currently available options are for further anal-ysis or processing and how each of these options can becalled into action. Thus "obvious" is distinguished from"simple": the set of options and commands available tothe user will probably not be simple, because a certaindegree of complexity is unavoidable if the set of avail-able tools is to be sufficiently powerful and flexible.Flexibility and a wide variety of capabilities (e.g., facili-ties to save and repeat procedures, and automatic logging and documentation aids) are required to meet thestatistician's need for convenience. Yet the use of thesecapabilities must also be convenient (i.e., easy for thestatistician to understand and use).

One technique that is being developed for the ALDSuser interface is a common command format and pro-cess invocation method for all major software compo-nents within the ALDS system, including data manage-ment, graphics, and statistics algorithms and packages.This command format must have a natural syntax andsemantics; the vocabulary and mnemonics must bethose with which the data analysts aTe familiar. Com-mands must exist on several levels of brevity for bothnew and experienced users. A hierarchical "help" facil-ity is being designed to make the commands "self-explanatory." If the user forgets to enter a requiredparameter or file name, the system must detect thissituation and prompt the user for the missing informa-tion.

A common error-handling system is also needed todetect and intercept all types of error conditions,regardless of their origin, and report these errors to theuser in a common, understandable way (no undecipher-able error codes). In some cases it will be possible forthe user to correct the error and continue the inter-rupted process. In all cases, however, the user must beinformed of all available options for handling thespecific error condition which has occurred.

SELF-DESCRIBING DATA FILE STRUCTURE

As part of the ALDS data management effort, we aredeveloping a self-describing binary (SDB) data file struc-ture designed especially for scientific data bases. (Thus,we have a dual meaning for the acronym "SDB.") TheSDB files will hopefully provide "everything you alwayswanted in a data file structure - and maybe a fewthings you've never even thought about."

A typical problem with large collections of data isthat related data comes from a variety of sources and isthus organized in different ways and in diverse formats.All the data must be brought together onto somecommon medium and placed in some common formatbefore it can be effectively processed and analyzed as awhole. A related problem is that stand-alone softwarepackages such as SPSS and MINITAB have their own(often unique) file format to which the data mustconform for processing by that package. It is sometimesdesired to pass the output results from one package asan input file to another package, which again requiresreformatting a data file. The SDB file structure willalleviate these problems by providing a common fileformat which can serve as a data interface for a widevariety of statistical, data management, and graphicspackages.

Another common problem relates to the descriptionof data. There is often a lack of adequate softwarefacilities to provide for storage and manipulation of thedescriptive characteristics (both qualitative and quanti-tative) of the data items and for verbal documentationof the origin, history, and general features of a data fileor data base. The attributes and characteristics of thedata are usually stored separately from the data itself, ina data dictionary or directory file. These data diction-aries can themselves become quite large and complex.The verbal text describing the history and overview ofthe data base usually must be stored in yet anotherseparate documentation file. The SDB file structure isintended to provide for the storage and manipulationof, in some sense, a "complete" description of data,including data attributes, characteristics, and documen-

Page 210: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

207

tation. Furthermore, these descriptors will be storedtogether with their associated data items in one file. Thedescriptors will be stored in a header block precedingthe data elements.*

Additionally, the SDB data file structure will make itpossible to store the data and their descriptors in acompact, efficient representation. Block-structured di-rect-access input/output techniques will provide for fastdata access. Both of these characteristics are extremelyimportant to the storage and manipulation of large datasets.

The specific types of descriptors that an SDB file willcontain are

• physical (data location) descriptors, which includedata types and lengths, storage structures and index-ing modes, and all the counters, pointers, and indexesnecessary to locate the physical position of any givendata item within the file;

• logical (data attribute) descriptors, which includedependent variable labels, multiple shared attributes(independent variable labels and values) for crossedand/or nested designs, integrity constraints (bothlogical and quantitative), and missing value codes;and

• file descriptors, including user-supplied commentsand an update log containing date, time, softwareprogram name, and other pertinent information foreach update of the data file.

In conjunction with the SDB file format design, a setof "kernel" data manipulation functions are beingdeveloped to store, retrieve, and update data elementsand descriptors within SDB files. The "kernel" func-tions are the minimal set of data manipulation functionswhich are essential to support the requirements of largedata set analysis. These functions are being imple-mented in efficient functionally independent modulesin order to minimize processing overheads and achievefast retrieval of data. Our goal is an efficient streamlinedset of data manipulation routines especially tailored tothe data management needs of large data set analysis.

*A data directory file will also be provided so that thelocation of specific data sets can be found quickly, but thedirectory file will contain only compact summaries or indexesof the descriptors stored in each of the data files. If the userknows which tile contains the data he wants, he can bypass thedirectory and access the data directly from the SDB files. Thus,the data directory in this system can be thought of as a datamanagement aid and is not a necessity to the SDB datamanagement system.

INTERACTIVE STATISTICAL GRAPHICS

Another major area of ALDS computer science re-search is interactive statistical graphics. We have ac-quired a fast, high-resolution color raster display unitfor the project. We will initially build upon existinggraphical techniques by tailoring and extending them tomeet the particular data visualization needs of ALDS.We also hope to develop new innovative data presenta-tion and selection methods for large data sets. Thefollowing is a brief description of some of the conceptswe have been thinking about.

Multiple display screens are an effective way ofexpanding the ability of the data analyst to view datarelationships at several levels of detail simultaneously.For example, a low-resolution monochrome CRT coulddisplay an overview representation of a data set fromwhich a specific subset could be selected and displayedin considerable detail on a high-resolution color graphicsdevice. The overview representation would remain onthe low-resolution screen as a reminder to the statisti-cian of the context from which the detailed plot wasextracted. A second multi-screen application is side-by-side comparison. Two or more similar data sets could becompared using the same analysis and display tech-nique; alternatively, the results of two or more distinctanalysis methods operating on the same data set couldbe compared.

Many data analysis systems use computer graphicssolely for output visualization. We believe there is agreat need and a great opportunity to apply interactivegraphical input techniques to large data set analysis.Graphical input devices such as a light pen or a cursorcontrol mechanism can be used to "point" to specificregions of the screen and thus visually select data pointsfor removal, relocation, or cluster definition.

Other possible areas where interactive statisticalgraphics can be applied to large data set analysis includeexploratory graphical fitting, dynamic scatter plots(e.g., during a single pass through a large data set), andan investigation of the value to data analysis of the useof color and shading techniques. Most of these graphicstechniques are not particularly new or original; it is theapplication of them to large data sets, utilizing a fasthigh-resolution color raster graphics device with thecapability to rapidly and clearly display large amountsof data which creates the opportunity for some uniqueand exciting research.

AREAS OF COOPERATION

Finally, I want to mention the efforts we have beenmaking to establish and foster cooperation with other

Page 211: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

20S

DOE laboratories and with selected universities andresearch organizations. (This discussion applies to boththe statistics and computer science areas of ALDS.)

The ALDS project has an Advisory Panel consisting ofeight gentlemen (see Table 1) recognized as experts ineither computer science or statistics. Their function isto provide outside technical consultation and guidanceto the project. The second annual ALDS Advisory Panelproject review meeti ng is scheduled for November 1979.The ALDS project staff will present the current statusof the project and the research, development, andimplementation plans for the coming year. Discussionsof the various technical areas and the overall plans andobjectives of the project will then take place, followedby formal statements from each of the panel memberssummarizing their observations, suggestions, and jecom-mendations regarding future directions of the project.

In addition to the Advisory Panel, we have one otherconsultant. Professor Will Dixon of the Department ofBiomathematics at the University of California at LosAngeles provides his experience and expertise in theareas of statistical data management and the use ofcolor graphics in data analysis.

Cooperation with other DOE laboratories has begunon several fronts. In particular, PNL and LawrenceBerkeley Laboratories have taken the initial steps toensure hardware compatibility and to establish a com-puter communications link between the two facilities,thus setting the stage for some significant software andresource sharing between the two labs (ultimately ex-tending to other labs). Other laboratories, notably LosAlamos and Oak Ridge, will participate in variousphases of the ALDS statistical research, notably in dataediting and screening techniques.

We have established dialogue and cooperation withresearchers at the following universities, in the technicalareas noted:

• Washington State University - statistical analysis ofan example "large" data set (stellar parallax measure-ments);

• George Washington University - graphics softwareand standards, user interface;

• University of Toronto - data management, distrib-uted microprocessors;

• Princeton University - statistical research; and• University of California at Los Angeles - statistical

data management, graphical data analysis.

In addition, several students at Washington StateUniversity and the University of Washington have par-ticipated in various phases of ALDS research (e.g., assummer employees at PNL), and there is some potentialfor thesis work related to the project. We hope for anincreased involvement with research at the universitylevel.

In conclusion, we feel that the ALDS project offersthe opportunity for some very exciting research incomputer science, in statistics, and in interdisciplinarycommunication and cooperation. We recognize the needto be cognizant of currently available software andtechnology and the opportunities to take advantage ofexpertise elsewhere within the scientific community. Atthe same time, we want to be in a position to share withothers the large data set analysis methodology andsoftware which we hope to develop.

Table 1. ALDS review panel

Professor Peter BloomfieldDepartment of StatisticsPrinceton University

Dr. Jerome H. FriedmanStanford Linear Accelerator CenterStanford University

Professor Jack HellerChairman, Department of Computer ScienceSUNY at Stonybrook

Di. Wesley L. Nicholson (Panel Chairman)Statistics SectionPacific Northwest Laboratory

Dr. Richard QuanrudAssistant Director for

Agriculture and Economic CensusU.S. Bureau of the Census

Dr. Arie ShoshaniComputer Science and Applied Mathematics DepartmentLawrence Berkeley Laboratories

Dr. Benjamin J. TeppingPrivate Statistical ConsultantWashington, D.C.

Professor David L. WallaceDepartment of StatisticsUniversity of Chicago

Page 212: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Help, Where Are We?*

Richard J. BeckmanLos Alamos Scientific Laboratory

Los Alamos, New Mexico

ABSTRACT

"Large data set" is defined by a set of rules. Use of a preprocessor for looking for outliers in thedata will be discussed. Questions about data postprocessors are asked. In addition, questions are askedabout the analysis of large data sets. Data structures are discussed using the National Uranium Re-sources Evaluation and U.S. Geological Survey projects as examples.

One of the most difficult problems concerning theanalysis of large data sets is a definition of "large.""Large" must be defined by a set of three rules. Thefirst rule is that there must be at least one data point.The number of data points in a data set does not neces-sarily make it "large." It is the complexity in the datathat makes the data set "large." The second rule is thatcomputer hardware or software must be used in theanalysis. It is against the rules of large data sets to dothe analysis "on the back of an envelope." The last andmost important rule is that all of the data may not beseen at one time. Therefore, under this definition, aunivariate time series with one million data points doesnot constitute a large data set.

The actual analysis of large data sets may be brokeninto two main areas; these are pre- and postprocessingof the data. Preprocessing consists of data audits andedits. There is much statistical research to be done inthis area. For example, how do you find outliers indiscrete multiway tables? Are there ways to find multi-variate normal outliers? What is the role of nonpara-metrics in data audits? These are just a few of thestatistical questions to be answered.

This area contrasts with that of postprocessing of thedata. Postprocessing involves finding structure in thedata. However, it is not certain that there is anything

•Report LA-UR-79-3192, Los Alamos Scientific Laboratory,Los Alamos, New Mexico. Work supported by the Departmentof Energy under Contract No. W-740S-eng-36.

to do in postprocessing other than the usual statisticalanalysis. It is just not clear that there is any statisticalresearch to be done in this area.

There are three questions that should be asked in thefield of large data sets. First, can the data structurehelp find structure in the data? By data structure wemean the organization of the data. For example, multi-variate data may be organized as cases by variables. Forevery case and variable combination, there is a datapoint. Another data structure is the nested data struc-ture where groups of variables are nested under othervariables. An example of this type of structure is givenin the uranium hydrogeochemical and stream sedimentreconnaissance project of the National Uranium Re-sources Evaluation program. In this project, several wa-ter samples are taken at specific locations, and a chemi-cal analysis is made for 43 elements including uranium.The data are stored so that components that are spe-cific to the location are level-zero variables. These vari-ables are such things as latitude, longitude, rock type,rock color, and relief. Nested under these variables arelevel-one components, which include concentrations ofthe 43 elements for the samples taken at the specificlocation. A third type of data structure is the net-worked data structure. In this type of structure, datavectors are "neighbors" to each other. In other words,data vectors are in some way connected to other datavectors. An example of thii type of data structure isgiven by an offshore oil and lease data base maintainedat the Los Alamos Scientific Laboratory for the U.S.

209

Page 213: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

210

Geological Survey. In this data base, data vectors areconnected by physical location. The question, there-fore, is, given a specific data structure, does this struc-ture help with the statistical analysis? Should tech-niques be developed for different data structures?

A second question may be asked in the area of largedata sets: Is the eye quicker than the hand? In otherwords, what is the role of computer graphics in theanalysis of large data sets? It has been said that a pic-ture is worth a thousand words. No one has ever said

that a picture is worth a thousand statistical analyses.How many statistical analyses is a picture worth? It isworthwhile in the cases-by-variables data structure with100 variables to look at all 4950 two-variable plots?

The last question that must be asked in this area oflarge data set analysis is probably the most importantfor the statistician. Is there any statistical research tobe done, or i? the research all in the area of computerscience?

Page 214: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

A Small Shopping List for Large Data Set Analysis

Leo Breiman

Santa Monica, California

LARGE IS NOT THE PROBLEM

The difficulty is complexity. For instance, select 108

data points from a normal distribution. It's large butneither very interesting nor challenging nor complex.

Complexity is a result of size, dimensionality, andstructure. A rough territorial definition of a large com-plex data set might be > 100 variables, and ^megabyteof storage.

As I thought about my wish shopping list for complexdata base analysis and thought about other statisticiansI have heard describing their own Christmas lists, itcame to me with the clarity of Scrooge's vision thatthere are many lists — very personal and very idiosyn-cratic. My list, like others, is a reflection of my ownexperience with a limited number of large and complexdata bases.

SOURCES OF LARGE DATA SETS

Large data sets I have known have come from threesources. The first is record-keeping information systems.These are large computerized systems, originally de-signed for housekeeping record work. Much later,someone pokes at them in the often forlorn hope thatthey can be used for statistically significant studies.

Examples of such systems are insurance companyfiles, hospital records, large corporation employeerecords, etc. For instance, the Colorado State Courtcomputerized record-keeping system enters, on line,almost all of the information regarding each of the over100,000 cases filed yearly.

Record-keeping data bases are generally very large,filling tens or hundreds of tapes. The records are ofvariable dimensionality and mixed data types. Themajor problem is data quality. The records are repletewith missing, miscoded, or otherwise erroneous entries.The major effort, both in time and money, is in upkeep-

ing the data base. Too little effort is generally budgetedfor quality control on the duta. If there is statisticaloutput, it is simple and descriptive. These systems werenot designed to answer statistical questions but veryoften are tempting to statisticians simply because theyare there.

A second source of large complex data bases are spe-cial surveys and observational studies; some examplesare the Rand health insurance study, the DOE house-hold energy consumption study, and the Bureau ofCensus 50,000 household surveys.

The characteristics of these bases are smaller samplesizes (generally in the 103 to 104 range), much betterdata quality, a large number of variables, mixed datatypes, and dimensionality not as variable and oftenfixed. Most important, the data base was designed forstatistical purposes.

Finally, some statistically useful data bases are beingput together through information systems that haveevolved past the record-keeping stage and into graduallyupgraded data quality and capabilities. One example isthe Los Angeles Air Pollution Monitoring System,which has been in existence and evolving for over ageneration. Large volumes of information, concerningboth numerous pollutants and meteorological variables,collected many times each day, are telemetered to acentral computer from dozens of monitoring stations.The data quality has gradually but significantly im-proved, and the system is currently in a stage where thedata are useful in a variety of statistical studies.

DATA GATHERING

By the time the data gathering is completed, at leastthree-quarters of the battle is over. If the informationdesired can not be separated out of the data, if it is toonoisy, if the quality is too low, and if the sample is not

211

Page 215: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

212

well designed, then no amount of analytic ingenuity canrescue the study.

Hopefully, statisticians will become more intimatelyinvolved with the gathering of relevant high-qualitydata. This is a significant role. Statisticians look upondata gathering with a perception different from that ofnonstatisticians involved in the same project. Their viewis more retrospective. That is, they view the data gather-ing knowing what methods are available and what prob-lems may arise in analyzing the data after it has beengathered. Statisticians are aware that their chief antago-nist is variability or noise and that as much as possiblethe data gathering should be designed to either elimi-nate, control, or estimate the noise levels. This concept,which seems so simple and natural, is often surprizinglynovel to deterministically minded people.

If the statistician is not in on the initial design and allphases of the data gathering - that is, if somebodystacks a pile of tapes on your desk and says, "Well, hereit is," then the first essential is to become thoroughlyacquainted with the data base. A good healthy statisti-cal attitude involves a mild case of paranoia. Believe noone concerning the state of a data base! Practicing stat-isticians often sit around swapping horror stories aboutdata bases. Everyone has his own favorite, roughlytitled, "How I got stung by assuming that my data wasokay, but what happened was . . . ."

When a reasonable assessment is made about the qual-ity of a given data base and serious deficiencies are un-covered, then the decision that may have to be made isto abandon the data set and start from scratch, insteadof putting time and energy into a salvage operation.Sometimes, much fruitless effort can be saved by givingup sooner instead of later.

WHAT MAKES A DATA SET COMPLEX?

Complex data sets are characterized by too many vari-ables with

• mixed data types,• nonstandard data structures, and• nonstandard relationships.

That is, complexity is added when the problem ishigh-dimensional and, also, the data contains a fair num-ber of both categorical and ordered descriptors; whenthe data does not occur in a case-by-variable rectangulardata structure but instead, for example, the dimen-sionality is highly variable or the intrinsic structure ishierarchical or nested; and when the relationships be-tween variables occur in forms that make standard

descriptors of variable relationships noninformative.Another factor that adds complexity is nonhomo-geneity, that is, different relationships between variablesin different pieces of the data set.

APPROACHES TO THE ANALYSIS

The goals of the statistical analysis are to explore andquantify the relationships between selected variables inthe data base. With small low-dimension data bases,often the goal is to answer a single question, that is, totest a single hypothesis. In large complex data bases, thestatistician is usually after many diverse pieces of infor-mation. Many different types of analyses may be neces-sary and appropriate. The fabric of the data is muchricher.

The usual statistical approach to an analysis has itsemphasis on the discovery of truth. A model foT truth isformulated, and then the analysis proceeds on the basisof the model.

Usually, it is not only truth that is wanted, but thewhole truth. That is, the model leaves unspecified onlya small number of parameters. Then the data is used toprovide locations for these unknown parameters.

For most large complex data bases, this procedure isneither reasonable nor productive. Usually, one beginspretty much in the dark. It is not really sensible to startworrying about models for truth when what is reallyrequired is some rudimentary understanding of a com-plex phenomenon.

The approach that is used in practice is largely de-scriptive. While this is not exactly a secret, there seemsto be widespread embarrassment over the fact that themethods used are ad hoc and heuristic as judged bycommonly accepted statistical standards.

The discrepancy between theory and practice is that itis not often possible to "discover the truth." In prac-tice, the statistical endeavor becomes more modest andaims only at extracting some relevant information fromthe data base.

In the use of standard statistical procedures, the anal-ysis proceeds by a sequence of static stage settings. Amodel is assumed, a hypothesis is tested, confidenceintervals are computed, and so on. The goal, however,of finding relevant information in a complex data baserequires a much more fluid interaction, which movesahead by trial-and-error exploration, using descriptivetechniques.

The important information may not turn out to bethe answer to the present questions. An analysis thatattempts to answer some questions that originallyseemed very straightforward may wind up raising many

Page 216: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

213

other questions that are significant and relevant to thephenomenon being studied.

Working in this area requires a willingness to do intel-ligent exploration in a scientific spirit, instead of tryingto prove that the data supports this or that point ofview.

Because the primary goal is aimed at understandingthe nature of the problem and the data, standards for agood analysis might be

• simplicity,• usefulness,• production of insights, and• unbiased conclusions.

Even though many of the techniques used are descrip-tive, the important statistical issue of bias in the conclu-sions has to be looked at. What does bias mean if nounderlying model is defined? The concepts of dividing adata set randomly into a learning set and test set, or ofcross-validation, or of "bootstrap" statistics suggest thatthe definition of "unbiased" conclusion might beroughly formulated as hypothetical replications of thedata base leading to similar conclusions. This getsaround the specification of a model but still uses it as aconceptual framework. Even though we are bioadeningthe goals of an analysis to a general program to separatesignal from noise, we still need to know when our con-clusions are based on the noise instead of the signal.

SOME PRELIMINARY SHOPPING ITEMS

The shopping list that follows is pretty much potluck. It begins with good data-handling capabilities -specifically, automatic subsampling and sampling onBoolean combinations of keys.

Subsampling means the selection of a random orstratified subsample of the full data set. Because a largedata base is often too expensive to work with as awhole, smaller subsamples have to be selected. Thesemay be more informative if they are stratified on se-lected variables.

Along this line, a convenient capability is to be able tolook at the subset of all data points satisfying someBoolean combination of conditions on the variables,that is, look at all data points such that xx < 3 and xn =A or x9 > 7. Similarly, being able to subsample fromthe set of all data points satisfying such Boolean condi-tions is a convenient capability.

Once a subsample is selected, there should be lots ofdifferent ways of forming descriptive summaries of it -for instance, selected cross tabs. Graphic capabilities arenice, but, for example, with 100 variables, there are

about 5000 possible scatter plots of pairs of variables.Graphics will have to be designed more thoughtfully tobe useful for large data sets.

Other handy items are ways to find out how the sub-sample differs from the main body of the data. Forexample in a recent Los Angeles air pollution data base,there are hundreds of meteorological variables. Of the1000 days represented, a few dozen have oxidant levelsabove the EPA-defined Stage I alert level. In what waysdoes this group of days differ from the main body ofthe data? Which meteorological variables are higher orlower on these days than on the rest of the days. If thevariable is categorical, how does its distribution differfrom the main body?

The answer, at least with the air pollution data, is thatthe most significant differences seem to turn up in non-linear combinations of variables. The difficult part isfinding the right combination of the right variables.

Because the right combinations may combine bothordered and categorical values, this brings up a pointworth expanding a bit. In looking for analysis methods,an important criterion is that mixed data types can behandled in a natural way. Unfortunately, most availabletechniques either force categorical variables into a moldof numerical variables or conversely.

Of course, one needs some data-editing capability tofind erroneous or missing data. In addition, there is alsothe strange point (outlier) problem. Most large data setscontain some strange points, that is, unusual multi-variate combinations. An efficient algorithm for locat-ing strange points would be a welcome addition. Butonce one rejects the use of multivariate normal think-ing, even the problem of defining strangeness is diffi-cult.

A final preliminary problem concerns missing values.With over 102 variables, often a large proportion ofcases in the data base will have at least one variablemissing. The easiest way to handle this problem is topatch in (replace) missing values, which brings up someinteresting and unresolved issues. What is the best wayto patch? What is the size of the error introduced bypatching? Is the data missing at random or ina pattern?If it is missing in a pattern, can the patching be done toaccount for the pattern?

The answers to these questions are largely unknowneven in the case where all variables are numerical andeven if a multivariate normal distribution is assumed.

WANTED: GOOD DISMEMBERMENT TECHNIQUES

A complex data base can be nonhomogeneous in thesense that there may be different relationships betweenvariables in different regions of the data set. The next

Page 217: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

214

shopping list items are to find ways to detect non-homogeneity and slice up the data into homogeneouspieces.

As a overly simple example, suppose that

for*i<3, p(xl,x2) = 0.8,

forx,>3, p(xl,x2) = 0-2 .

The data looks like the figure below. The problem isto detect the change in the correlation and estimate thechange point.

HN{y) is a measure of homogeneity of y for subset ofdata N, where y is the response variable.Start with ail data. Search over all possible binarysplits of all independent variables; that is, for Xjordered, search all splits of form Xj<A, ~ar- <A<°°; for Xj categorical, xy is an element of { 1 , . . . ,J\search over all splits of form Xj e Jj, where Jj rangesover all subsets of { 1 , . . . ,J).Each binary splu of a variable splits the data set intoNhN2.

•s .

A simpler problem is to find dismemberment tech-niques using a response variable. Here the nonhomo-geneity of interest is that there may be different func-tional relationships between the resr>r>nse variable andthe independent variables in various parts of data space.I know of two classes of dismemberment techniques forthis situation:

• tree-structured binary splits [ I ] , and• piecewise linear or polygonal splits.

Tree-structured binary splitting algorithms work inthe following sequence of steps:

• Goodness-of-split is the increase in homogeneity aftersplit into Nj, N2. That is, if the sample size in N is n,in Nt is n , , and in N2 is n 2 , then put

AH=^HN{ (Y) + r^HN2(Y)-HN(Y).

• Find the split which maximizes the goodness-of-spHtAH.

• Now repeat this procedure on NUN2, etc.

The procedure can be pictured as in the tree below.This corresponds to cutting data space into rectangleswith sides parallel to coordinate axes.

Page 218: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

215

Xne(A,C)

Another dismemberment technique is through the useof piecewise linear regression. If a continuous leastsquares fit of response variable by the class of piecewiselinear functions of independent variables is carried out,the result is a data-directed cutting of data space intopolygons with a linear fit in each polygon. Althoughalgorithms exist for doing this, they are computa-tionally too expensive for high-dimensional data setsand are not suited for mixed data types.

General dismemberment techniques without a re-sponse variable do not seem to be around. Heuristictrial-and-error splitting can be used. Systematic methodswould be welcome.

WANTED: GOOD COMPLEXITYREDUCTION TECHNIQUES

Assume that the data is roughly homogeneous andthat by subsampling we are down to 103 samples or lessbut with a large number of variables. My next-to-lastshopping list item is to find ways to reduce the numberof variables drastically with no loss in information.

Some standard approaches to this are the following:

L with a response variable - stepwise regression andstepwise discrimination,

2. without a response variable - principal componentsand factor analysis.

There are some serious drawbacks to the above meth-ods:

• Stepwise methods often mask information and givemisleading results.

• Principal components include all variables and givecomplicated analysis.

• Factor analysis is expensive and usually limited to100 variables.

• All methods assume linear relationships and operateon the covariance matrix.

• Categorical variables are handled in a clumsy way.

Some other more or less specific shopping items incomplexity reduction are as follows:

1. A method for determining when variables are super-fluous. Usually, a variable is defined to be super-fluous if it can be predicted accurately fromremaining variables. What about a few large resid-uals? Can these be significant?

2. A generalized kind of factor analysis that works onmixed data types, can handle large number of vari-ables, and can handle nonlinear relationships.

Page 219: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

216

3. An intelligent bivariate relationship measure formixed data types that can be efficiently computed.

Rank order correlations are a step in the right direc-tion, but their computation requires sorting, and an ex-tension to mixed types is not clear. Another possibilityis in the direction of measures of association for contin-gency tables, but these depend on the size of the tableand cut points if the variables are ordered.

A natural bivariate distance measure could be helpfulin a number of ways. For instance, using multidimen-sional scaling, two-dimensional displays of the variablescould be constructed with the response variable (if thereis one) at the origin. The variables could be clustered ina manner similar to the method now used by BMD-Pbased (usually) on a correlation distance.

WANTED: A MERGER

Computer usage is almost synonymous with large dataset analysis. Computer power opens interesting and

exciting possibilities for the development and use ofnew data-analytic techniques. Computing power, oncewe become familiar with its use and potential, may havea strong formative effect on the way we think aboutdata set analysis.

Good programmers and good computer centers areessential ingredients both in working with complex datasets and in developing new technology, but universitycomputer centers aie generally not set up so that statis-tics departments can conveniently work with data sets.Neither are graduate statistics students crackerjack pro-grammers. These conditions make it difficult for statis-tics departments to significantly participate in develop-ment of techniques for complex data set analysis. Thisunfortunate result puts even more of a barrier betweentheory and practice.

To make my shopping list more of a combination ofwhat is attainable and what is hoped for, my last item isthe acquiring by statistics departments of good comput-ers, good programmers, and interesting complex datasets.

REFERENCES

1. L. Breiman, "Growing Trees to Analyze High-Dimensional Data," pp. 3-29 in Proceedings, 1978DOE StatisticalSymposium, November 1-3,1978, Albuquerque, New Mexico, July 1979.

Page 220: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

217

WORKSHOP III - DISCUSSION

Thomas Gilbert, Argonne National Laboratory.Dr. Breiman's suggestion that models should not beused in analyzing large data sets is a bit disturbing. It ismy understanding that the underlying purpose of sci-entific activity is to discover comprehensible relationsbetween observables. In large data sets, these relation-ships will usually be subtle patterns buried in "noise." Idoubt that such patterns can be found without usingmodels. For example, it is unlikely that the few tracksthat identify a new elementary particle could be foundamong the thousands of bubble chamber photographscollected in a high-energy physics experiment withoutprecise physical models to guide the search. The situa-tion for the more complex systems encountered ineconomics and sociology is different in that the numberof observables one must take into account is muchlarger. But I doubt that the subtle relationships we needto find to make 'he data comprehensible will bediscovered without using models for guidance.

Wes Nicholson. Models can be used very differently atdistinct stages of a data analysis. I wonder whether thatis not the issue here. To illustrate the bubble chamberexample, models, or more precisely, pictorializedphysics theory, is needed to guide the search for newelementary particles. Suppose present theory says thatbubble chamber particle tracks should bend to the right,so we direct our attention toward isolating tracks thatdon't bend to the right. The use of models as guidanceis essential for efficient use of the statistician's timeduring the early stages of a large data set analysis - theexploratory stage.

Later in the analysis, the set of tracks that don't bendto the right are further characterized by classical modelfits. Now we determine particle mass, particle velocity,etc., and estimate precision. Out of this classical analysiscomes a definitive statement concerning the evidencefor new elementary particles - the confirmatory stage.

The bag of tools for isolating the set of tracks thatdon't bend to the right will have a small intersectionwith those for classical model fitting. It is the explora-tory stage of the analysis that we are thinking about,when we say goodby to classical statistics.

Gary Tietjen, Los Alamos Scientific Laboratory. Idisagree with everything that has been said. I'm of a.afferent school of thought. As far as automatic dataediting goes, the statistician can never, world withoutend, detect outliers (which do satisfy the constraints)automatically, without looking at his data. What weneed, then, are good ways of looking at the data. Wethen need intelligent ways of dividing the data into

chunks which retain the original patterns along with acollection of war stories so we know what to look for.What we need is not a life buoy, but to drain theswamp.

Dan Can. I'd like to respond. I was a bit worriedwhen you said you disagreed with everything. However,I am in substantial agreement with most of yourstatements. We may have a difference of opinion ordefinition on the issue of detecting outliers automati-cally without "looking" at the data. I use the computerto "look" at the data and believe that in many ways itextends my perception.

When I am able to translate my ideas of whatconstitutes outliers into an algorithm, the computer canflag the instances it finds. Under my control, it canprovide graphic views of points identified as outliersthat both aid me in interpretation and help me assessthe adequacy of my initial ideas. I would agree with allof the following interpretation of your statement. (1)Outlier identification is an art, and we are not able totranslate all our rules into algorithms. (2) Rules foridentifying outliers and the interpretation of resultsneed to be data set dependent and will evolve as weunderstand the data. (3) We are frequently tempted totrust "easy" computer results because it is relativelycumbersome to do the alternative analysis and checkingthat is deserved. Thus, while our predispositions towardcomputer usage may be different, I suspect that ourideas concerning what constitutes a reasonable analysisare not that far apart.

Vic Kane, Union Carbide Corporation. Nuclear Divi-sion. I'd like to understand a little bit better the rolethat the statisticians working on this project see forstatistics. In particular, let me just list some of thestatistical topics that have been mentioned. These aresampling, variable selection, both regression and dis-criminatory analysis modes, dimension reductionmodel approximation, missing value estimation, discreteand categorical analysis, clustering of both variables andsamples, distributional fitting, and I think the list couldgo on. I agree with the list and that there are a largenumber of problems. However, I think that at least onefccus of DOE is to try to reduce the problems, so Idon't really understand what you have in mind. Idisagree, at least in some part, with the definition oflarge, and I think that this disagreement would clarifywhy you would have this large list of statistical topics.The last two authors said essentially that large P andlarge N are not necessarily statistical problems. Theymay be computing problems. I'm not sure the firstspeaker said this. There is some disagreement there. Iagree that large P and N are not statistical problems.

Page 221: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

: 218

I think that complexity is the issue. Let me giveanother definition of a large data base. I would define alarge data base as a collection of possibly nondisjointsmall data bases, where a small data base is one that canbe addressed in a unified statistical manner using classi-cal statistical methods. I think the problem is gettingfrom the large to the small. I really wonder about yoursupposition that there is a unified statistical theory inwhich we can go from large to small. 1 see problems intime series, in clustering, in data editing, data validation,discriminate analysis, in all these large data bases, and Ireally wonder what's the commonality that you have inmind?

Dan Can. I would like to mention a different aspectof the large problem. Someone else may be better ableto directly address your question. Cutting a large dataset into a bunch of homogeneous small data sets isdefinitely one aspect of the problem. But if there aremany small data sets, a description of all these smalldata sets is a problem in itself. How do we poolinformation from all these small data sets to present anoverlying pattern? If each data set has a little bit of astory, what is the whole story? I think there is an addeddimension to consider besides splitting, and that ispooling and summaries.

Dick Beckman. Splitting might be great, but let's takea look at another example. Here 1 have a data base overone quad in the NURE program. Two thousand guys,each a sample location, spread out here. Let's suppose arepeated sampling of five or six times for each one ofthese. What do I split? Is each one of these individualguys a data base? Am I down to the raw data? Yourquestion was a lot like mine. What the heck are wedoing? I'm not sure, but splitting to me might causesome problems. I'm not so sure we can get a unifiedtheory of splitting when we're talking about structuresthat look like this.

Wes Nicholson. I'll attempt to answer some of yourcomments. We have a panel that reviewed the ALDSproject last year. I made a comment to that panel thatI'll make again here: If this project is going to succeed,the people that are involved have got to get rid of theiregos. Much of this project isn't fun, and it's not going tobe publishable in the open literature. It is just going tobe a lot of dog work. But what we are trying to do is toeliminate the dog work in general large data set analysis.People that have been involved in large data set analysisknow it is a real chore. After you have analyzed a fewlarge data sets, you wonder whether or not you want toget involved with another one. Maybe I would rather gooff and play with a hundred data points and dosomething classical.

First of all, work will be centered around computerscience to provide better ways, faster ways, of doingwhat we've always been able to do in the past on theback of the envelope. We would like to be able to do itin an interactive mode so that we don't have toresurrect our thinking on the same problems everymonth when the batch jobs come back to us and w?'vebeen thinking about many other things in between.That's not particularly exciting and that's not publish-able. It is the sort of thing that we really feel we have todo. These are the aspects which people interested indoing research don't want to do. What we want is tohave a process for handling large data sett which is easyand efficient time-wise for the statisticians. Now I was alittle bit surprised at the dismissal of the case of justlarge N. The dismissal of that case seems to assume thatyou can reduce it to a previously solved problem bysampling. On the other hand, you have a lot moreinformation available to you. It's a question of what canyou do with that large JV that you are not able to dowith a small N1 And can you do it in an efficientmanner? One thing that has been mentioned, today, wasthat you can do very high-precision diagnostics on thetails. But, now you have either to look at the entire dataset or do some very clever sampling procedures toestimate tails cl racteristics.

David Margolies, Lawrence Livermore Laboratories. Isit possible to answer a question about Los Angelespollution such as, "What percentage of pollution iscaused by automobiles?"

Leo Breiman. I don't think anybody can answer thatquestion with any precision. 1 think you've got differentguesses. You will also get different answers from dif-ferent people. One of the difficulties is how do you splitknowing there is a lot of air pollution and a lot ofozone. How do you split contributions? For instance, inone interesting approach, if you'll permit the digressionfor a moment, people suspected that they could findwhat air pollution was by comparing weekday levels toweekend levels in locations where automobile tra'ficdropped significantly. Well, what happened was that ina number of cities they found higher ozone levels onweekends than they did during the weekdays. Thedifficulty is that the situation is so complex and thechemical reactions are so complex and depend uponmeteorology in such a complex way that at the presenttime I wouldn't have any confidence in any estimates asto the relative contributions.

Doug DePriest, Office of Naval Research. During yourpresentation, you discussed the problem of workingwith a data base generated from different sources andwhether or not to utilize the data. You suggested, in

Page 222: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

219

some instances, n on utilization of the data. What statisti-cal approaches would you recommend for analyzing alarge data base derived from several sources?

Leo Breiman. It is interesting because I did have abrief skirmish with some of the Navy data I got lastyear. I can't answer you directly because I don't knowenough about the differing data bases, but I think that'swhere the statisticians' intrinsic judgment and feelingfor the quality of data enters in. I think that is ajudgment call that can't be answered in terms ofclassical statistics. It has to be answered in terms of'.Ugging into the data, checking it out as carefully aspossible, feeling out how good it is and how useful it isgoing to be and how much you can actually retrievefrom it. I think your answers are going to be differentdepending on what you find out. But basically, it isgoing to come down to a judgment call. At the last ASAconference, there was a discussion on the health insur-ance data bases that the health people are trying to puttogether for various hospitals and insurance plans. Theyfinally concluded that the situation was so impossiblethey'd have to go out and do fairly large special-studysurveys to get the data they needed. That is oneexample in which they didn't junk the data bases, butthey determined that they just couldn't use them forwhat they needed. I know I haven't answered yourquestion directly, but I've answered as best I can.

Bob Easterling. I'd like to join Leo Breiman in sayinggoodby to "classical," theoretical statistics. I'd like toadd that what he said about the absurdity of the"model as truth" notion applies across the board -small data sets or laige data sets, however they'redefined. We've got to realize that models are just toolsfor the reduction or description of data. They're nottruth. Somebody once said, "models are to be used, notbelieved," and that should be our attitude.

The fundamental problem of statistics, as I see it, is todiscover and convey information. This is the case forboth small and large data sets. The ideas and viewsexpressed here should help us think about all data sets- large and small. I think we need to be careful inthinking "large" requires different ideas from small. Theideas are the same, the methods are different.

Thomas Gilbert. There appears to be a misunder-standing in the use of the word "model." Physicists usethe word to characterize any logical construction, ex-pressed in equations, diagrams, word?, or whatever isappropriate, that enables one to comprehend the rela-tion between a set of physical observables. Models arenot "the truth," but the crucial question is whether amodel is a "true map" of some aspect of nature in thesense that it describes the relation between some set ofvariables within specifiable limits of accuracy underspecified conditions. All activities of science are di-

rected toward the construction and verification of suchmodels. They are almost never found merely by ana-lyzing data taken without regard to some model. His-torical1 . models are nearly always generated by intui-tive insight and then verified by data collected fromexperiments that are carefully planned with a model inmind, where possible, or from carefully but honestlyselected records when this is not possible. In view of thecentral importance of models, it appears to me to beunwise to ignore them when generating or analyzinglarge data sets.

Gary Tietjen. In explaining my remarks, I advocatedusing sampling to play with the data, to formulate ideasof what to do, to understand the structure of the data,to use them for training sets. Then, 'vhen you havedecided what to do, you throw in all the data andproceed with your analysis.

John Scheunemeyer, University of Delaware. Let mesuggest that a survey be made of those analysts whowork with large data sets. This survey would include thedefinition of large data sets, the specific sizes and typesof the large data sets that have been analyzed, thecomputer software and hardware used, and the desiredhardware and software. Also, I would like to see aneffort made to incorporate into a package some of theexcellent graphics and other software developed by theresearch laboratories.

Wes Nicholson. You've paraphrased what we are doingon the ALDS project. We've had a bit of difficultyfinding out who the people are. Some are obviouslyanalyzing large data sets and some aren't. We havevisited sites both here and in Europe. Interestingly, we.find that the same type of data-directed, ad hoc toolsare being put together by all these large data setanalysts. The details are different, but they are all doingthe same type of thing. Many of the tools have a lot todo with what Leo calls dismemberment. That conceptflourishes among people who are analyzing large datasets.

Dick Beckman. I'd like to say one thing about thehorror stories. Most of my horror stories have beenpeople interactions. People go out and change samplesheets, for example, and don't tell me. Then all of asudden "No. 3" isn't what a "No. 3" was the daybefore. Three months down the road I suddenly dis-cover, gee, I've got 40,000 No.3's, and I'm onlysupposed to have 20,000 of thsm. What happened? It isthat kind of horror story that turns me gray. I used tohave black hair before I started this data analysisbusiness.

Well, it's one o'clock, so let's stop and have a breakfor the afternoon. Pd like to thank everybody forentering into the discussion and also thank our speakers.

Page 223: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Workshop IVResource Estimation

Lorry S. Mayor, Organizer

INTRODUCTION

Welcome to the session on resource estimation. The problem to be addressed in this session concerns estimatingthe amount of recoverable resources of oil and gas to be found in the United States, in the world, or in a particularbasin. One aspect that will be indirectly addressed is what and how statisticians can contribute to the solution of thisproblem. I think we have an excellent group of speakers here: Jack Schuenemeyer from the University of Delaware,David Root from the US. Geological Survey (USGS), and Peter Blocmfield from Princeton University; ourdiscussant is Lou Gordon from the Office of Energy Information Validation of the Energy InformationAdministration. The papers are going to be rather short, about 35 minutes in length; after each paper questions thatdeal with the technical aspects of the paper may be raised. On the other hand, general discussions of the paper oughtto wait until after the discussant has had a chance to comment. Toward the end of the session, we shall have ageneral discussion of the problem of resource estimation, and people may direct questions to any of the speakers, tomake their own comments, or to ask general questions.

Our first speaker is Jack Schuenemeyer, a University of Georgia graduate in statistics. He has worked at theUSGS and is now on the faculty of the University of Delaware in Newark.

Our next speaker is Peter Bloomfield, who did both his undergraduate and graduate work at Imperial Collegeunder D. R. Cox. He then came to Princeton University where he is currently Professor and Chairperson of theDepartment of Statistics. He spent a year involved in the Princeton Resource Estimation and Validation Project.

One of my charges in organizing this session was to have a nonstatistician, and so I invited Larry Drew, ageologist at the USGS, twice. He was involved as part of the Drew and Schuenemeyer piece, and he was invited aspart of the Drew and Root piece. Well, we got both pieces, but we didn't get Drew. With that I want to introduceDavid Root. David works with Larry Drew in the USGS in Western Virginia. Actually a probabilist by education, hegraduated from the University of Washington at Seattle and taught at Purdue University. The title of this talk, whichDrew claimed I couldn't remember, is "How Do you Know It's There If Somebody Hasn't Found It."

221

Page 224: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

An Analysis of a Statistical Model to Predict Future Oil

John H. SchuenemeyerUniversity of Delaware

Newark, Delaware

ABSTRACT

Many techniques are used to estimate the total amount of oil remaining to be discovered. How-ever, another important problem is the estimation of the oil that may be found by the next incrementof exploratory effort.

The search for oil may be characterized as a trial and error process in which fields are found by thesiting of exploratory wells. When the search area is restricted to a geologically homogeneous region,called a play, the historical drilling and discovery data exhibit sufficient statistical regularity to permitestimation of future oil by field size class.

The form of the statistical model discussed in this paper is based upon two important observations,namely that large oil fields tend to be found during the early stages of exploration and that most ofthe oil is in the large fields. These factors permit the formulation of a forecasting model based uponthe discovery process. The measure of search effort in this model is the area exhausted or condemnedby exploratory wells. Procedures to estimate the discovery efficiency and the effective piay size fromthe data are discussed. The model uses these estimates to predict future oil by field size classes forsome future number of exploratory wells.

INTRODUCTION

Before discussing the petroleum supply model devel-oped by Drew, Schuenemeyer, and Root [1 ] , I'm go-ing to briefly mention some other supply models inorder to place our work into perspective and to illus-trate the reason for the choice of the structure of ourmodel.

Probably the best known of the petroleum supplymodels is that of M. King Hubbert [2]. He graphicallyfit the U.S. petroleum yearly discovery data to a logis-tic curve and made estimates of the ultimate produc-ible petroleum and the year of peak production. Thecurve-fitting approach has since been used by manyother investigators to estimate ultimate oil. This ap-proach has the advantage of requiring very little dataand being computationally simple. However, there areseveral important questions related to resource estima-tion and policy planning which cannot be answered bycurve-fitting techniques: How much oil remains in par-tially explored basins? What is the size distribution ofthe remaining oil fields? How many fields in a given

size class can one expect to find by drilling some futurenumber of exploratory wells? The answers to thesequestions are needed to estimate the marginal findingand development costs of future oil.

DISCOVERY-PROCESS MODELS

Models that explicitly incorporate the primitive as-sumptions which govern the finding of oil have beenproposed to address these questions. In addition to themodel developed by Drew, Schuenemeyer, and Root[1], two other discovery-process models will be dis-cussed briefly. The form of these discovery-processmodels is based upon the physical nature of the discov-ery process (see [3]). The central assumption is that inthe exploration of a geologically homogeneous region,the larger fields tend to be found early. For example,in the Midland basin in west Texas, all 18 of the100-million-barrel fields were discovered by 1954 when37.1% of the 1974 total wells had been drilled. Theaverage field size in BOE (barrels of oil equivalent) wasalmost 60 million in 1941 in the Midland basin. By

223

Page 225: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

224

1954, this figure had declined to 5 million barrels perfield and through the end of 1974 had fluctuated be-tween 1 and 2 million BOE per field. This type ofdecline is representative of that observed in other ba-sins, both onshore and offshore. AH of the discoveryprocess models to be discussed use this central assump-tion and can be used only to make predictions in apartially explored play.

Perhaps the discovery process model best known tostatisticians is that of Barouch and Kaufman [4]. Inthis model, they assume that in a homogeneous geolog-ical region (a play) there are N pools of volumes,Fp . . . , VN, and V., i = I, . . . , JV, is lognormallydistributed. They also assume sampling proportional tosize without replacement, where size may be volume orarea. By this assumption, the probability of discoveringFj , V2, • • •, Vn, where n < N, in that order is

nn VN).

The input to this model is ordered sizes of discoveriesfrom a partially explored play. The output is the esti-mated sizes of discoveries in the remainder of the play.The wildcat-well success ratio necessary to estimate fu-ture discovery rates is determined by sources outsidethe model.

This success ratio is determined endogenously bythe Arps and Roberts model [5]. Their model requiresdiscoveries on a scale of wildcat wells and estimatesfuture oil within field size classes. The Arps andRoberts model is based upon the assumption that thenumber of fields FA(w) in size class A found by drill-ing w wildcat wells is proportional to the ultimatenumber of fields remaining to be found in this sizeclass fA(°°) and the size of the field^4 in areal extent.The form of the model is

FA(w) = FA(°o)[\ _ exp(-cAw/B)] ,

where c is the efficiency of exploration and B is theeffective size of the basin. Arps and Roberts appliedtheir model to the Denver basin, a single-play basinconsisting of only one producing horizon. Estimates ofthe average exploration efficiency c and the effectivebasin size B were made by Arps and Roberts on thebasis of their extensive knowledge of the Denver basin.An estimate of the effective basin size is required be-cause that area where explorationists are actually will-ing to locate wells is usually much smaller than thebasin area initially defined on the basis of geologic orother factors. The form of this model explicitly incor-

porates the declining rate of return of exploratory drill-ing and produces forecasts of future discoveries as afunction of any future increment of exploratory drill-ing.

THE DREW, SCHUENEMEYER, AND ROOT MODEL

The model developed by Drew, Schuenemeyer, andRoot is based upon the geometrical concept of area ofinfluence derived by Singer and Drew [6]. The form ofthe Drew, Schuenemeyer, and Root model [1] is

fA'l-[l-EA(w)lB)cA,

where fA is the fraction of discovered fields in sizeclass A given that EA(w) of the effective basin size Bhas been searched by w wells with respect to fields ofsize A and that cA is the discovery efficiency. Thismodel was developed for an exploration play area con-taining a single producing horizon. In this model, werepresent fields as ellipses. Thus, geometrically we maythink of the play area as a plane containing nonoverlap-ping ellipses.

This model requires as input both well-drilling anddiscovery information. Specifically it requires the geo-metric location and completion data of all wells(whether the well was a wildcat or development) andinformation on its success. For a successful wildcatwell, the field size in BOE is required. The output ofthe model is similar to that of Arps and Roberts,namely the size distribution of undiscovered fields cor-responding to some future level of exploratory drilling.A model evaluation study was performed using the his-torical drilling and discovery data in the Denver basin.Approximately 20% of the drilling data was required tomake reasonably accurate forecasts. Details of thisstudy are given in Drew, Schuenemeyer, and Root [1].Recently this model has been applied to the Midlandbasin, which contains multiple producing horizons, bydividing the basin into depth intervals and making inde-pendent estimates of future oil within each interval(Schuenemeyer, Drew, and Bawiec, unpublished manu-script).

In the Drew, Schuenemeyer, and Root model, thethree parameters to be estimated are the area ex-hausted EA(w), the effective basin size B, and the dis-covery efficiency cA. Then the ultimate number offields in size class A, NA, is estimated, and using NA anestimate can be made of the number of fields fromsome future increment of exploratory drilling.

The area exhausted was chosen as a measure ofsearch effort. With this measure each wildcat well re-moves an area not greater than A from the effective

Page 226: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

225

basin size B. Given a well file ordered by well comple-tion dates, the area exhausted by wells with respect toan elliptical target of a given size and shape can becomputed using a program developed by Schuene-meyer and Drew [7]. The results are not sensitive tothe shape of elliptical targets.

The other two parameters to be estimated are theeffective basin size B and the discovery efficiency cA.Arps and Roberts were able to make reasonable esti-mates for B and the average efficiency c because oftheir detailed knowledge of the Denver basin. However,because it may not always be possible to obtain thislevel of knowledge, we have chosen to estimate B andcA from the historical drilling and discovery data. Theestimate of the effective basin size B is based upon thecrowding of wildcat wells. It is found by solving thefollowing equation:

where EA(n) is the area exhausted by h effective wild-cat wells with respect to elliptical targets of size A. Thenumber of effective cumulative wells n is needed be-cause the area is exhausted by all wells; however, devel-opment welis are closely spaced and do not exhaust asmuch area as wildcats. The derivation of this estimateand that of the discovery efficiency is given by Rootand Schuenemeyer [8]. The discovery efficiency cA

for fields of size A is said to be random and set equalto one if a given increment of search effort results in

the same number of expected discoveries throughoutthe exploration history of the play. The discovery effi-ciency is defined to be cA > 1 if, when searching asmall increment, the chance of finding a field is cA

times greater than if the drilling were random. WhencA>\, more of the fields are found earlier in theexploration sequence. For example, if cA = 2, then,when the basin is 50% explored, 75% of the fieldswould have been discovered. In the Denver basin, thelarger fields tend to be found with efficiencies greaterthan one while small fields appear to be discoveredrandomly.

SUMMARY

A discovery process model is one in which the primi-tive assumptions about the discovery of oil are the ba-sic model assumptions. The models I have discussed arebased upon physical variables and in particular on thewell-documented empirical observation that large fieldstend to be found early in the exploration sequence.The Drew, Schuenemeyer, and Root model uses thehistorical drilling and discovery data to estimate thesize distribution of future discoveries for some futurenumber of wildcat wells. Three parameters, the areaexhausted, the effective basin size, and the discoveryefficiency, are estimated from the historical data. Theform of the output of the model is suitable for calcu-lating the finding and the development costs of futureoil.

REFERENCES1. L. J. Drew, J. H. Schuenemeyer, and D. H. Root, Resource Appraisal and Discovery Rate Forecasting in

Partially Explored Regions: Part A, An Application to the Denver Basin, U.S. Geological Survey ProfessionalPaper No. 1138,1979.

2. M. King Hubbert, Energy Resources, A Report to the Committee on National Resources, National Academy ofSciences-National Research Council, Pub 1000-D, 1962. [Reprinted 1973 by National Technical InformationServices, U.S. Dept. of Commerce, No. PB-222401, Washington, D.C.: U.S. Government Printing Office.]

3. L. J. Drew, E. D. Attanasi, and D. H. Root, "Importance of Physical Parameters in Petroleum Supply Models,"Mater. Soc. 3: 163-74 (1979).

4. E. Barouch and G. M. Kaufman, "Estimation of Undiscovered Oil and Gas in Mathematical Aspects of Produc-tion and Distribution of Energy," pp. 77—91 in Proceedings of Symposia in Applied Mathematics, Vol. 21,American Mathematical Society, 1977.

5. J. J. Arps and T. G. Roberts, "Economics of Drilling for Crealaceous Oil on East Flank of Denver-JulesburgBasin," Bull. Am. Assoc. Petroleum Geol. 42: 2549-66 (1958).

6. D. A. Singer and L. J. Drew, "The Area of Influence of an Exploratory Hole," Econ. Geol 71: 642-47 (1976).7. J. H. Schuenemeyer and L. J. Drew, "An Exploratory Drilling Exhaustion Sequence Plot Program," Comput.

GeoscL 3:617-31(1977).8. D. H. Root and J. H. Schuenemeyer, Resource Appraisal and Discovery Rate Forecasting in Partially Explored

Regions: Part B, Mathematical Foundations, U.S. Geological Survey Professional Paper 1138,1979.

Page 227: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Statistical Analysis of Petroleum ExplorationPeter BJoomfieldDepartment of Statistics

Princeton UniversityPrinceton, N(tw Jersey

ABSTRACT

Exploration and discovery data from Kansas and the Denver-Julesberg basin have been analyzed.Some statistic X descriptions of the exploration process have been constructed, and the models for thediscovery of oil fields have been examined in the light of these data.

226

Page 228: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

Data Requirements for Forecasting the Yearof World Peak Petroleum Production

Lawrence J. Drew and David H. Root

U. S. Geological SurveyReston, Virginia

The two functions of the petroleum industry whichwill be considered here are the discovery and produc-tion of crude oil. The analysis of these functions will becarried out in terms of the physical properties ofdiscovery and production rather than from an economicor political point of view. The thesis here is that it ispossible to assemble industry data that will enable oneto forecast the future rates of discovery in barrels ofcrude oil per exploratory well for much of the worldwith sufficient accuracy that one can forecast approxi-mately the time when proved reserves will be inade-quate to sustain the then current level of worldproduction. The method by which such a forecast canbe made and the data required will be outlined in adiscussion of the historical performance of the petro-leum industry in the Permian basin and in the conter-minous United States.

In western Texas and southeastern New Mexico is aregion of sedimentary rock called the Permian basin. Itis over 6000 m thick in places and at the surface has anarea of more than 200,000 km2. Within that volume ofrock, 4014 oil and gas fields were discovered between1920 and 1975 by drilling a total of 30,320 exploratoryholes. An examination of the exploration history of thePermian basin reveals patterns in the sizes of fields andtheir order of discovery from which reasonable forecastsof future discoveries can be made.

For the purposes of this analysis, oil and gas werecombined on an energy equivalent basis [5270 ft3 wetnatural gas = 1 barrel of oil equivalent (BOE)]. Thedistribution of field sizes discovered in the Permianbasin by the end of 1974 is shown in Fig. 1. As is clearfrom this figure, only a few fields are large. In fact,there are only 70 fields that contain 100 million BOE ormore, and yet these 70 fields contain 62% of the oilequivalent in the 4014 fields (23 X 109 BOE = 62% of

37 X 109 BOE). The important characteristic of the sizedistribution is that most fields are small and most of theoil and gas is in a few large fields. Figure 2 shows theaverage size of fields discovered in 14 successive drillingincrements of about 2000 exploratory holes each. Overthe whole history of exploration, the average size offields discovered declined from 121.2 X 106 BOE in thefirst drilling increment to 24.6 X 106 BOE in the nexttwo drilling increments. In the last 11 drilling incre-ments the average was never as high as 10 X 106 BOE.

The discovery history in the Permian basin can beconveniently divided into three phases, a high discoveryrate phase, a transition phase, and a low stable discoveryrate phase. These three phases are summarized inTable 1. The characteristic of the discovery process thatwe want to point out is that the big fields are discoveredearly in the high discovery rate phase, and then there isa rapid decline until ihe discovery rate finally stabilizesat a low level. This pattern in petroleum discovery rateswas pointed out by Hubbert in 1967 to hold for theconterminous United States taken as a single explora-tion area (Fig. 3).

These examples of discovery histories in the Permianbasin and the conterminous United States suggest that itis possible to forecast future discoveries accuratelyenough to predict (or at least recognize after the fact)when world crude-oil production will begin to declinebecause of inadequate on-line reserves. The data re-quired to forecast the future discovery rate are whenand where exploratory holes are drilled around theworld and when and where oil fields are discovered andhow large they are. The two discovery histories in thePermian basin and the conterminous United Statesindicate that these data are sufficient because they showthe petroleum industry to be able to evaluate prospectswell enough that their discovery rate is generally

227

Page 229: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

228

3789

1 2 3 4 5 6 7 B 9 10 11 II 13 14 15 16 17 18 19 20 21

RECOVERADIE Oil AND GAS (10* BARRELS OF OIL EQUIVALENT)

Fig. 1. The distribution of sizes of oil and gas fields, in barrels of oil equivalent, discovered in the Permian basin between 1920and 1975. Gas is equated to oil at the rate of 5270 ft3 wet gas to one barrel of oil. The field size data were,supplied by the DallasField Office of the Department of Energy.

Z 120ui

g 110

2 100

58 «

3 8° |-I6

„ ' 40

39 M

= 20

3£ io

11I1

1111\1I1111 /

/ -

ty i

\

__>>

PERMIAN BASIN

o——°

-

-

8000 13000 16000 20,000 24,000 28,000 32.000

CUMULATIVE NUMBER OF EXPLORATORY WELLS

Fig. 2. The average sizes of fields discovered in the Pennian basin for 14 successive drilling increments. The drilling data werepurchased from Petroleum Information, Inc., and the field size data were supplied by the Dallas Field Office of the Department ofEnergy.

decreasing. Of course, when exploration moves into anarea formerly closed to drilling, there can be localincreases in the discovery rate. For example, theexploration of the North Sea caused an increase in theEuropean discovery rate just as the exploration ofnorthern Alaska caused an increase in the U. S. dis-covery rate. These situations are recognizable in advance

and the frontier areas can be dealt with separately. Forexample, one could - for the purpose of forecastingproduction — only allow for discoveries in the partiallyexplored area. This would not cause great harm to aproduction forecast because any oil found in frontierareas would be slow coming onto the market because ofthe difficulties of developing fields in physically hostile

Page 230: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

229

Table 1. Trend in avenge size of oil and gas fields discovered in thePermian basin. 1921-1974

Phase 1

Phase 2

Phase 3

Drillinginterval"

1

2345678

91011121314

Time period

1921 -1938

1939 19461947 19501951-19521953 19541955 19561957-19581959-1960

1961-19621963-19641965-19661967-19681969-19711972-1974

Number ofexploratory

holes

2015

1937233421222352267124302228

210220292100180421572059

Number offields

111

171277251322377344309

327362304271309288

Averagefield size

(million BOE)

121.2

27.722.78.46.54.52.82.3

3.34.12.52.43.51.4

"Drilling intervals selected to include approximately 2000 exploratory holes.

CUMULATIVE EXPLORATORY FOOTAGE ( l o ' f t )

Fig. 3 . The crude-oil discovery rate in the conterminous United States for the 1.5 X 109 ft of exploratory drilling carried outfrom 18S9 to the end of 1966 (modified from Hubbert, 1967).

areas; thus, one would have time to adjust plans to takeaccount of the new production long before it arrived. Inany case, by the time world production is nearing itspeak, there will be few such frontier areas still unex-plored and those areas that are unexplored will havebeen passed over for years, either because they were tooexpensive to operate in and hence slow to make theireffect felt or because they had been judged poorprospects for significant discoveries.

Once the drilling and discovery data have beenassembled, then one can construct discovery rate curves

analogous to Fig. 3. The curve in Fig. 4 is an example ofsuch a curve for the noncommunist world outside theUnited States and Canada. From a discovery rate curveone can estimate the volume of crude oil that will befound each year for any assumed rate of future drilling.Then it is possible to estimate the rate at which thestock of producible oil in known fields is beingdiminished by production and augmented by dis-coveries. From these estimates, one can forecast thetime when rising world crude-oil production will meetdeclining world crude-oil production capacity.

Page 231: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

230

10 20 30 40 50 60 70 80 90 100

WEUS (THOUSANDS)

Fig. 4. The crude-oil discovery rate (averaged over five-year periods) in the noncotnmunist world outside the United States andCanada for the 28,446 exploratory holes drilled in that area between January 1, 1950, and January 1, 1975. The number of wellsdrilled in each five-year period was taken from the Bulktin of the American Association of Petroleum Geologists. The amounts of oildiscovered were taken from the Exxon Corporation.

REFERENCES

American Association of Petroleum Geologists, "Foreign Developments Issues," Am. Assoc. Petroleum GeologistsBull. 35-60(1951-1976).

M. King Hubbert, "Degree of Advancement of Petroleum Exploration in United States," Am. Assoc PetroleumGeologists Bull. 51(11): 2207-27 (1967).

Exxon Corporation, World Energy Outlook, New York, 1978.D. H. Root and E. D. Attanasi, "World Petroleum Availability," presented at the Society of Plastics Engineers, 37th

Annual Technical Conference, New Orleans, May 7-10,1979.

Page 232: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

231

WORKSHOP IV - DISCUSSION

Lee Abramson, Nuclear Regulatory Commission.What is the relation between your model and theKaufman-Barouch and Arps-Roberts models? How dothe results of your model compare with these othermodels when applied to the same data?

John Schuenemeyer. The assumptions that we makeare somewhat different from the assumptions thatKaufman and Barouch made, and the mathematics isappreciably different. One way to characterize thedifference is that our model tends to be simple in formbut is certainly data intensive. Although the Kaufman-Barouch approach is very complicated mathematically,the type of data they require is very simple; however,there is a lot of conditioning required. We have com-pared our model to the Arps-Roberts approach, whichagrees fairly closely. I don't know of any studies wherewe've done the same thing that Kaufman has.

Wes Nicholson, Pacific Northwest Laboratory. I havetwo questions for John Schuenemeyer: (1) In your firstslide on predicting the total field size, what happened in1954? It appears that there was a critical well or wellsdrilled that dramatically changed the prediction picture.(2) You treat the estimation as a two-dimensionalproblem. Are all wells drilled down through the possibleoil levels, or is the depth important? Should theproblem be a discovery in three dimensions?

John Schuenemeyer. Let me answer the last questionfirst. In the Denver Basm, all of the wells are atapproximately the same depths. In fact, it's actually atwo-dimensional problem. In the case of the MidlandBasin, that is not true. There are multiple producinghorizons, and what we have done is to slice this "thing"up into various depth intervals and make independentestimates within each depth interval. We have tried thismethod for different depth intervals just to get somesensitivity to the choice of the depth interval. There wasclearly a lot of drilling activity that took place around1954, and a lot of the larger fields were found then. 1don't remember the percentage of fields that werefound after 1954. I would say that there is not a singlewell that made any difference.

Our results show that big fields are found for tworeasons. One is simply that they are big. Our resultsindicate that they are found even faster than theyshould be found just because they are surface projectionareas. So there is certainly evidence that physical andgeological information come into play with respect tothe search for the large fields. But when you are lookingat the small fields, they appear to come in at prettymuch random fashion.

Larry Mayer. I'd like to make a couple of commentsjust to clarify John's remarks. Those of us who work inresource estimation may forget that there are a fewstatisticians in the country who haven't worked in thearea. In my opening remarks, I said that the problemthat we are interested in is estimating how much oil andgas there is in the world or in the United States. It isimportant to recognize that most of the modelsmentioned by John are models for how much gas andoil remains in a single basin. If we could figure out howto model a basin, then we could aggregate those modelsover basins and figure out how to estimate the amountof resources in the country at large and in the world atlarge. Because this talk referred to the Kaufman model,I'd like to outline it. Peter Bloomfield's talk will alsoaddress it.

The assumptions of the model were covered in John'stalk. The model assumes that the volume of oilcontained in the / field is distributed lognormal, and theprobability that you discover the ith field is propor-tional to its size. This simple model came out ofKaufman's thesis and led to some of the methodologyused in the USGS studies. It seems to me that theprimary difference between the Drew-Schuenemeyermodel and the Kaufman model is that the Drew-Schuenemeyer model concentrates on analyzing fieldsby class size, and thus does not model the distributionover the class sizes and does not require the standardlognormality assumption.

Ram Uppuhiri, Union Carbide Corporation, NuclearDivision. Does probability proportional to the athpower of size mean that the ath power of sizes is usedthroughout the Barouch-Kaufman model?

Peter Btoomfield. Yes, wherever the word "area" andsymbol a appear, you simply replace the word "area"by discoverability and replace the symbol a by S(a) orsomething.

Frosty Miller, Desert Research Institute. What is theeffect of the recent "discoveries" in Mexico. Wouldn'tthey affect your barrels per foot pretty severely?

David Root. The curve I displayed ran through 1974at which time those discoveries were not reported. Howlarge those discoveries are is still very much in doubt.Although *he Washington Post runs extremely largenumbers, lue actual numbers claimed for crude reserveare not large. The estimates of how much oil there is aresimply being held secret. Although secrecy is notusually the rule, it is in this case. One of the recei.cdiscoveries was reported to be a field of a hundredbillion barrels. It is difficult to say when the field wasfirst found. It had a thousand wells in it, and, at the ratethat drilling goes on in Mexico, a thousand wells takes a

Page 233: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

232

long time. Thus, in fact, it must be a very old field, butpresumably of such poor-quality oil or poor-producingcharacteristics that it was never commercial. Although itwasn't really discovered in the 1970s, when the price ofoil went through the roof, a field that had been knownfor a long time suddenly became a commercial field. Itstill had all the bad characteristics that held it back forso long. Also, the North Sea came in, and it was not abig enough thing to change that curve, so whetherMexico will be is doubtful.

C. Smith, Department of Energy, Energy InformationAdministration. I know they have been suppressingdiscoveries for a number of oil exploration results for anumber of years. When did they stop reporting theirdiscoveries?

David Root. Those numbers were Exxon numbers,and how they got them I'm not sure. I think they wereoperating those fields for a long time. Presumably theyhad access to the actual production data, but I'm notsure if they are official government figures. They mighthave come out of the Exxon private files.

Larry Mayer. I'd like to comment before our discus-sant speaks. There is not agreement on the size of eitherdomestic resources or international resources. I see, andother people see, the role of statisticians as reducingsome of the uncertainty about these sizes. If peoplewant to see statements to the contrary of thosepresented here, I suggest they look at the New YorkTimes magazine about a week ago Sunday whichreported, in essence, that there was no evidence abouthow much oil exists in the United States. There was anarticle critical of the oil companies contending that theoil companies both collect data and disseminate infor-mation, and that in fact there could be 10,000 SpindleTops left under the United States and none of us wouldknow it one way or the other. It is like saying that wehave no medical data in the United States becausephysicians keep the records and, because physicianshave a stake in the medical profession, everything theysay must be a lie. The role of statisticians is not to putout one set of numbers or another but to try to makesense out of an area that has a great deal to do with thequality of life in the country and in the world. At thesame time, there is a lot of uncertainty. Thus, nonumbers are uniformly blessed. There are, for example,David Root or Larry Drew numbers. For example, inMexico some of the people doing statistical analysisthink the published numbers are far too low. I think thedisagreement is worth n.r^ioning just for the record.Okay, our discussant is Lou Gordon. Lou is in theOffice of Energy Information Validation of the EnergyInformation Administration. He holds the Ph.D. in

Statistics from Stanford and has worked in the pharma-ceutical industry. He came to the Department of Energywith Lincoln Moses.

Lou Gordon, Department of Energy, Energy Informa-tion Administration Thank you, Larry. I'm going toconfine my remarks to the first 2% papers, which dealtwith domestic oil resource estimation. I have a coupleof reasons for doing this, not the least of which is that itconforms to some of my preconceived prejudices aboutwhat I was going to say today. I think it is entirelyappropriate and fortunate that this series of talks cameafter Wes Nicholson's workshop yesterday on large datasets, in which cries were heard for studies of things thathad to be done here. Although the speakers didn't comeout and say it, we have three case histories of large datasets in front of us.

Some of my remarks are going to be addressed to thestatistical issues brought up this morning; some of themwill be addressed to the computational and datamanipul live issues; and some of them are going to beaddressed to one of my pet hobby horses with whichsome of the semantic issues are involved. This is adiscussion of play analysis. The first 2% papers arediscussions of play analysis and not resource analysis.The difference is that play analysis would seem to bedata-intensive and more short-term in nature. If thetypical play takes 15 or 20 years to develop, we'reasking for about five years worth of data before we feelcomfortable with these models. I think the goal of theseanalyses is to prod us with estimates of ultimaterecovery which are within a limited geographic orgeological area. I think that, from a statistical point ofview, we're into about the second phase of a three-phaseprocess. The first phase is model construction, includingdata gathering. Another phase is model evaluation. Dowe believe that these models have much to offer? Thethird phase is parametric estimation. I think that a lotof Peter Bloomfield's discussion concerns the beginningof this second phase of rather intensive model evalua-tion, and I welcome its continuance as a consumer ofstatistics. I would like to say a few words about thelarge data sets that underlie these policies. As I waswatching John's large, blue slide which had size classesand well distribution, I actually saw the thing drippingred blood and decided that the theme of many of thesetalks was blood, toil, sweat, and tears.

I think Peter Bloomfield's analysis, as Peter said, wasthe least intensive of the three, since it dealt solely withthe ordering of discovery data and paid no attention tomeasures of intensive effort or to time sequencing ofthe discovery. This is natural given the way theKaufman model is set up. The preparation of data for

Page 234: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

233

Bloomfield's analysis could be comfortably measured inweeks. I think the Schuenemeyer-Root analysis is easilymeasured in months of effort. In David Root's tables ofthe Permian Basin were (although I didn't add them) atleast 3000 or 4000 fields. All the fields were pasted towells. David described the source data on wells; I knowthat these came from the API well ticket files, and Iknow what sort of efforts it took to paste those ontofields.

I think all these data sets are large for a couple ofreasons. First of all they are not reducable. There are nosufficient statistics. They are not really compactlydescribable either. If one would refer to metadata asdata about data, the metadata associated with the datasets are huge, and the knowledge of the metadata isperhaps as important as the knowledge of the data.There is an originally very high entropy content. Peter, Iknow, has spent time sanitizing his Oil Scout data. Iknow that John, David, and Larry have done similarthings to their data. We're analyzing complex processes.There is nonstationarity of the discovery process overtime. Indeed, in some sense, the flavor of what we aredoing, if one wanted to tie it to mathematical statistics,is that we're looking at important sampling populations.For those of you who may be interested in some of thisstuff, it naturally goes under multiple names and, in themathematical statistical literature, successive sampling isthe key word that you look for. There is some work byP. K. San, and, although classified as finite populationtheory, limit results can be found in asymptotic galore.

Speaking of finite population, one of the things thathas puzzled me about some of the Schuenemeyer workis a physical interpretation for c, where c is thediscovery efficiency. As John was talking, I waswondering about some sort of analog to a finitepopulation correction.

Implicit in all the discussions in all three papers issome feeling for trying to separate physical phenomenafrom intentional issues. These intentional issues areperhaps where a great deal of the discussion is.Semantically it is a funny business because words havemultiple meaning. There are many words in thesediscussions that have many meanings. I will stick withsome of the Bloomfield words as examples simplybecause I have some better feel for what he has done. Alarge part of the Bloomfield analysis had to do withareal extent. Areal extent is obtained essentially bytaking a well count times the zoning, and the uniformzoning in Kansas is about 40 acres—except when itisn't. In some sense, then, the graphs that we sawdisplayed comparing resource size with areal extent is intruth resource size with number of wells. Perhaps it isn't

"number of wells"; perhaps it is "number of producingwells" or "number of ever-producing wells." This goesback to the metadata problem. It also goes back,perhaps, to some degrees of the confusion we see inthese data in the sense that the same data frequentlyhave multiple different names describing multiple dif-ferent objects. There is, for example, multiple recovery,which turned out to be cumulative production toabandonment for those fields where data were available.I think that these analyses are data-bound. I think that agreat amount of the effort that goes into them, andrightly so, is the construction and rationalization of thedata in an effort to reduce their entropy and to increasetheir informational content. I think that is a processthat is going to go on for quite a while.

Larry Mayer. Okay, I think it is time to open the floorfor general discussion. That includes the speakers if theyhave any comments they would like to make.

Peter Bloomfield. I'd like to straighten out one thingin respect to what Lou just said. The volume analysisthat I just showed you today was based on Denver-Julesburg data, which was production plus reserves.Lou's reference was to some other work that usedvolume from Kansas, and that was indeed cumulativeproduction at abandonment.

Larry Mayer. Lou, the difference between this areaand medical research is that at least in medical researchyou have some basic terms like "death." For the mostpart, one knows what "dead" means. I'm not as con-cerned about making inferences out of reserve data as Iam about the fact that after being in this business fortwo years I still don't know what reserves are. I don'thave the slightest idea what parameter is being esti-mated by a reserve estimate. When an engineer tells youhe/she is estimating reserves, and you ask what it is thatis being estimated, he/she will repeat, "reserves." Whenyou try to figure out exactly what "reserves" is, youhear things like "working inventories" and on and on. Itis never quite clear exactly what a reserve is. I agreewith you there are some very basic semantic problems.

David Root. One thing that is essential when esti-mating reserves is to estimate a number less than thenumber left in the field. Because if the engineer esti-mates more he gets sacked. Another general difficulty -Can you trust the numbers? - which Larry and Loubrought up is that there is a lot of information that issecret. (However, you can tell by the behavior of thepeople who have it what it must be.) Watching the oilindustry we're sort of in the position of someone Hvingin a small town in Bavaria in 1943 with nothing to readbut the controlled press. Still he can't fail to notice thatthe great victories are getting closer to Berlin. As you

Page 235: Proceedings of the 1979 DOE STATISTICAL SYMPOSIUM

234

watch the oil industry, you can see those same signs of theorems in general successive sampling from finitedifficulty even though any individual number is difficult populations.to defend. Larry Mayer. Ate there any other questions? Any

Ram Uppuhiri. Sampling proportional to random size other discussion? Then I suggest we th*nk the speakersyields different limit theorems as opposed to other limit and adjourn.