a framework for validation of computer models · 2010. 4. 19. · j.a. cafeo, j. cavendish, c.h....

58
A Framework for Validation of Computer Models M.J. Bayarri, J.O. Berger, D. Higdon, M.C. Kennedy, A. Kottas, R. Paulo, J. Sacks National Institute of Statistical Sciences J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework that enables computer model evaluation oriented towards answering the question: Does the computer model adequately represent reality? The proposed validation framework is a six-step procedure based upon Bayesian statistical methodology. The Bayesian methodology is particularly suited to treating the major issues associated with the validation process: quantifying multiple sources of error and uncertainty in computer models; combining multiple sources of information; and updating validation assess- ments as new information is acquired. Moreover, it allows inferential statements to be made about predictive error associated with model predictions in untested situations. The framework is implemented in two test bed models (a vehicle crash model and a resistance spot weld model) that provide context for each of the six steps in the proposed validation process. This research was supported by grants from General Motors and the National Science Foundation (Grant DMS- 0073952) to the National Institute of Statistical Sciences. 1

Upload: others

Post on 12-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

A Framework for Validation of Computer Models∗

M.J. Bayarri, J.O. Berger, D. Higdon, M.C. Kennedy, A. Kottas, R. Paulo, J. Sacks

National Institute of Statistical Sciences

J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu

General Motors

October 27, 2002

Abstract

In this paper, we present a framework that enables computer model evaluation orientedtowards answering the question:

Does the computer model adequately represent reality?

The proposed validation framework is a six-step procedure based upon Bayesian statisticalmethodology. The Bayesian methodology is particularly suited to treating the major issuesassociated with the validation process: quantifying multiple sources of error and uncertainty incomputer models; combining multiple sources of information; and updating validation assess-ments as new information is acquired. Moreover, it allows inferential statements to be madeabout predictive error associated with model predictions in untested situations.

The framework is implemented in two test bed models (a vehicle crash model and a resistancespot weld model) that provide context for each of the six steps in the proposed validation process.

∗This research was supported by grants from General Motors and the National Science Foundation (Grant DMS-0073952) to the National Institute of Statistical Sciences.

1

Page 2: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Contents

1 Introduction 41.1 Motivation and overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Sketch of the framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71.3 Testbeds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Understanding the Model and Its Uses (Steps 1 and 2) 102.1 Step 1. Specify model inputs and parameters with associated uncertainties or ranges

- the Input/Uncertainty (I/U) Map . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2 Step 2. Determine evaluation criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

3 Data Collection (Step 3) 14

4 Model Approximation (Step 4) 16

5 Analysis of Model Output (Step 5) 21

5.1 Notation and statistical modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 Bayesian inferences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

5.2.1 Calibration/tuning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2.2 Predictions and bias estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 245.2.3 Tolerance bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.2.4 Uncertainty decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

5.3 Outline of the Bayesian methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

6 Feedback; Feed Forward (Step 6) 31

7 Functional Data 31

8 Extrapolation Past the Range of the Data 35

9 Merging Predictive and Physical Approaches to Validation 409.1 The probability that the computer model is correct . . . . . . . . . . . . . . . . . . . 409.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 429.3 Merging numerical and statistical modeling . . . . . . . . . . . . . . . . . . . . . . . 43

10 Additional Issues 44

10.1 Computer model simplification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4410.2 Utilization of transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4410.3 Modularization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4510.4 Multivariate output functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4510.5 Updating . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4510.6 Accounting for numerical instability and stochastic inputs . . . . . . . . . . . . . . . 46

2

Page 3: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

A Resistance Spot Weld Process Model 47A.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47A.2 The welding process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47A.3 The computer models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

B Modeling for Vehicle Crashworthiness 49

C Technical details for Section 4 50C.1 The GASP response-surface methodology . . . . . . . . . . . . . . . . . . . . . . . . 50C.2 Processing stochastic inputs with GASP . . . . . . . . . . . . . . . . . . . . . . . . . 54

D Technical details for Section 5 54

D.1 Prior distribution for the bias function . . . . . . . . . . . . . . . . . . . . . . . . . . 54D.2 Analysis with model approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

E Technical details for Section 7 56E.1 Kronecker product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56E.2 Analysis of function output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

F Technical details for Section 8 57

3

Page 4: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

1 Introduction

1.1 Motivation and overview

We view the most important question in evaluation of a computer model to be

Does the computer model adequately represent reality?

In practice, the processes of computer model development and validation often occur in concert;aspects of validation interact with and feed back to development (e.g., a shortcoming in the modeluncovered during the validation process may require change in the mathematical implementation).In this paper, however, we address the process of computer model development only to the extentthat it interacts with the framework we envision for evaluation; the bulk of the paper focusesinstead on answering the above basic question. In particular, we do not address the issue of codeverification. General discussions of the entire V&V process, with discussion of many other pertinentissues, can be found in Roache (1998), Oberkampf and Trucano (2000), Cafeo and Cavendish (2001),Easterling (2001), Pilch et al. (2001), and Trucano et al. (2002).

Tolerance bounds: To motivate the approach we take to model evaluation, it is useful to beginat the end, and consider the type of outputs that will result from the methodology. We do notfocus on answering the yes/no question “Is the model correct?”1 In the vast majority of the cases,the relevant question is instead “Does the model provide predictions that are accurate enough forthe intended use of the model?” While there are several concepts within this question that deserve– and will be given – careful definition, the central issue is simply that of assessing the accuracyof model predictions. This will be done by presenting tolerance bounds, such as 5.17 ± 0.44, fora model prediction 5.17, with the interpretation that there is a specified chance (e.g., 80%) thatthe corresponding true process value would lie within the specified range. Such tolerance boundsshould be given whenever predictions are made, i.e., they should routinely be included along withany predictions arising from use of the model.

This focus on giving tolerance bounds, rather than stating a yes/no answer as to model validity,arises for three reasons:

1. It is often difficult to characterize regions of input variables over which the model achievessufficient accuracy.

2. The degree of accuracy that is needed can vary from one application of the computer modelto another.

3. Tolerance bounds incorporate model bias; accuracy of the model cannot simply be representedby a variance or standard error.

All these difficulties are obviated by the simple device of routinely presenting tolerance boundsalong with model predictions. Thus, at a different input value, the model prediction and tolerance

1It is possible to ask and answer this question within the proposed framework – see Section 9 – but the questionis often not a relevant question.

4

Page 5: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

bound might be 6.28 ± 1.6, and it is immediately apparent that the model is considerably lessaccurate at this input value. Either of the bounds, 0.44 or 1.6, might be acceptable or unacceptablepredictive accuracies, depending on the intended use of the model.

Bayesian analysis: Producing tolerance bounds is not easy. Here is a partial list of the hurdlesone faces.

• There are uncertainties in model inputs or parameters, and these uncertainties can be of avariety of types: based on data, expert opinion, or simply an ‘uncertainty range.’

• Only limited model-run data may be available.

• Field data of the actual process under consideration may be limited and noisy.

• Data may be of a variety of types, including functional data.

• Model-run data and field data may be observed at different input values.

• One may desire to ‘tune’ unknown parameters of the computer model based on field data,and at the same time (because of sparse data) apply the validation methodology.

• There may be more tuning parameters than data, so that the tuning parameters are not evenidentifiable.

• The computer model itself will typically be highly non-linear.

• Accounting for possible model bias is challenging.

• Validation should be viewed as an accumulation of evidence to support confidence in themodel outputs and their use, and the methodology needs to be able to update its currentconclusions as additional information arrives.

Overcoming these hurdles requires a powerful and flexible methodology. The Bayesian approachto assessment and analysis of uncertainty, which we adopt here, is one such methodology. Thisapproach is discussed in Section 5, together with its modern computational implementation viaMarkov Chain Monte Carlo analysis (see, e.g. Robert and Casella, 1999).

Bridging two philosophies: At the risk of considerable oversimplification, it is useful to catego-rize the approaches to model evaluation as being in one of two camps. In one camp, evaluation isperformed primarily by comparing model output to field data from the real process being modeled.The common rationale for this philosophy is the viewpoint that the only way to see if a modelactually works is to see if its predictions are correct. We will call this the predictive approach toevaluation.

The second camp primarily focuses on the model itself, and tries to assess the accuracy oruncertainty corresponding to each constructed element of the model. The common rationale forthis philosophy is that, if all the elements of the model (including computational elements) can be

5

Page 6: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

shown to be correct, then logically the model must give accurate predictions. We will call this thephysical approach to model evaluation.

Our own view lies primarily in the predictive camp, in that a modeler faces considerable diffi-culty in convincing others that all elements of the model have been correctly constructed, withoutdemonstration of validity on actual field data.

That said, it is worth noting that Bayesian methodology bridges both these philosophies. First,one can specify a prior probability that the computer model is correct and update this probabilitybased on any available data. Thus someone in the physical camp might declare that their priorprobability is 0.96 that the model is correct. If field data is then obtained, a Bayesian computation(see Section 9) might yield a posterior probability of 0.99 (in the case of supporting data) or 0.009(in the case of non-supporting data) that the model is correct. Those in the predictive camp(including ourselves) believe that such extreme prior specification is excessively informative andonly rarely justifiable.

Even in the predictive approach, however, Bayesian analysis allows utilization of prior infor-mation about elements of the model from the physical approach (either expert opinion or partialscientific knowledge), together with field data, in the construction of the tolerance bounds for modelpredictions; it incorporates whatever information is available to produce defensible quantificationof the adequacy of the model’s representation of reality. Furthermore, such physical knowledge cansignificantly reduce the amount of field data that is needed for predictive validation.

Side benefits of the methodology: Because the investment in understanding and using thismethodology is admittedly significant, we mention some of the side benefits that arise from theimplementation as done in the body of this paper.

1. When a bias in the model is detected by comparison with field data, the methodology auto-matically allows one to adjust the prediction by the estimated bias, and provides tolerancebounds for this adjusted prediction. This can result in considerably more accurate predictionsthan use of the model alone (or use of the field data alone).

2. A fast approximation to the computer model is available for use in situations, such as opti-mization, where it may be too expensive to use the computer model itself.

3. Predictions and tolerance bounds can be given for applications of the computer model to newsituations in which there is little – or no – field data, assuming information about ‘related’scenarios is available.

A Caveat: The process of model validation is inherently highly statistical, and is inherently ahard statistical problem. This is not to say that the scientific and mathematical sides of the V&Vprocess are not also of central importance, but the basic problem cannot be solved without useof sophisticated statistical methodology. Indeed, the statistical problem is so hard that one rarelysees analyses that actually produce tolerance bounds for computer model predictions.

The intent of this paper is essentially to provide a ‘proof of concept,’ that it is possible toprovide tolerance bounds for predictions of computer models, while taking into account all the

6

Page 7: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

uncertainties present in the problem. However, the computations required in the methodology wepropose can be intensive, especially when there are large numbers of model inputs, large numbersof unknown parameters, or a large amount of data (model-run or field). The test bed examples weconsider in this paper are relatively modest in these dimensions, and we have yet to see how the fullmethodology scales-up to more complex settings (although some components of the methodologyare known to scale-up to considerably more complex situations). It is likely that a variety ofsimplifications and/or innovations will be needed in such settings in order to apply the methodology.

Overview: In this paper we will restrict consideration to computer models that are deterministic,as opposed to stochastic. Section 1.2 provides an outline of the framework we recommend forcomputer model evaluation. Two testbed models are introduced in Section 1.3, a resistance spotwelding model and a crash model. Background details of the test bed models are in Appendices Aand B.

The proposed methodology for model evaluation is presented in Sections 2 through 6, withillustrations on the two test bed models. Sections 7 through 10 introduce a variety of generalizationsthat are needed to deal with specific contexts.

To prevent notational overload, we introduce notation and concepts as they arise in the eval-uation framework. Appendices C, D, E and F present some of the technical details needed forimplementation of the methodology.

1.2 Sketch of the framework

Validation can be thought of as a series of activities or steps. These are roughly ordered bythe sequence in which they are typically performed. The completion of some or all in the series ofactivities will typically lead to new issues and questions, requiring revision and revisiting of some orall of the activities, even if the model is unchanged. New demands placed on the model and changesin the model through new development make validation a continuing process. The framework mustallow for such dynamics.

Step 1. Specify model inputs and parameters with associated uncertainties or ranges

- the Input/Uncertainty (I/U) map. This step requires considerable expertise to help setpriorities among a (possibly) vast number of inputs. As information is acquired through undertakingfurther steps of the validation process, the I/U map is revisited, revised and updated.

Step 2. Determine evaluation criteria. The defining criteria must account for the context inwhich the model is used, the feasibility of acquiring adequate computer-run and field data, and themethodology to permit an evaluation. In turn the data collection and analyses will be criticallyaffected by the criteria. Moreover, initially stated criteria will typically be revisited in light ofconstraints and results from later analyses.

Step 3. Data collection and design of experiments. Both computer and field experimentsare part of the validation (and development) processes; multiple stages of experimentation will becommon. The need to design the computer runs along with field experiments can pose non-standard

7

Page 8: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

issues. As noted above, any stage of design must interact with the other parts of the framework,especially the evaluation criteria.

Step 4. Approximation of computer model output. Model approximations (fast surrogates)are usually key for enabling the analyses carried out in Step 5; fast surrogates are essential alsowhen the model is used for optimization of e.g., a manufacturing product design.

Step 5. Analyses of model ouput; comparing computer model output with field data.Uncertainty in model inputs will propagate to uncertainty in model output and estimating theresulting output distribution is often required. The related ‘sensitivity analysis’ focuses on ascer-taining which inputs most strongly affect outputs, a key tool in refining the I/U map.

Comparing model output with field data has several aspects.

– The relation of reality to the computer model (“reality = model + bias”)

– Statistical modeling of the data (computer runs and field data where “field data = reality +measurement error”)

– Tuning/calibrating model input parameters based on the field data

– Updating uncertainties in the parameters (given the data)

– Accuracy of prediction given the data

The methods used here rely on a Bayesian formulation; the details are in Section 5. Thefundamental goal of assessing model accuracy is addressed there.

Step 6. Feedback information into current validation exercise and feed-forward in-

formation into future validation activities. Feedback refers to use of results from Step 5 toimprove aspects of the model, as well as to refine aspects of the validation process. Feed-forwardrefers to the process of utilizing validations of current models to predict the validity of relatedfuture models, for which field data are lacking.

1.3 Testbeds

The test beds provide context for implementing each activity and also prompt consideration of afull variety of issues. The description of the validation framework, in Section 2, does not capturethe details and nuances encountered in any implementation. This fleshing out of details for thetest beds is done throughout Sections 2–8 where each activity/step of validation is accompaniedby explicit application to the test bed models. The result is the addition of concreteness to thegeneralities of the methods.

Testbed 1. The Resistance Spot Welding Model (SPOT WELD): In resistance spotwelding, two metal sheets are compressed by water-cooled copper electrodes, under an appliedload, L. Figure 14 in Appendix A is a simplified representation of the spot weld process,illustrating some of the essential features for producing a weld. A direct current of magnitude C

8

Page 9: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

is supplied to the sheets via the two electrodes to create concentrated and localized heating atthe interface where the two sheets have been pressed together by the applied load (the so-calledfaying surface). The heat produced by the current flow across the faying surface leads to meltingand, after cooling, a weld “nugget” is formed.

The resistance offered at the faying surface is particularly critical in determining the mag-nitude of heat generated. Because contact resistance at the faying surface, as a function oftemperature, is poorly understood a nominal function is specified and “tuned” to field data.The effect of this tuning on the behavior of the model is the focus of the test bed example.

The physical properties of the materials will change locally as a consequence of local increasein temperature. Young’s modulus and the yield stress of the sheet will fall (that is, the metalwill “soften”) resulting in more deformation and increase in the size of the faying contact sur-face, further affecting the formation of the weld. At the same time, the electrical and thermalconductivities will decrease as the temperature rises; all of which will affect the rate of heatgeneration and removal by conduction away from the faying surface.

The thermal/electrical/mechanical physics of the spot weld process is modeled by a couplingof partial differential equations that govern heat and electrical conduction with those that governtemperature-dependent, elastic/plastic mechanical deformation (Wang and Hayden, 1999).

Finite element implementations are used to provide a computer model of the electro-thermalconceptual model. Similarly, a finite element implementation is made for the equilibrium andconstitutive equations that comprise the conceptual model of mechanical/thermal deformation.These two computer models are implemented using a commercial code (ANSYS).

Details of the inputs and outputs of the models are in Appendix A and are summarized inTable 1. The particular issues faced are spelled out as we proceed through the exposition in thefollowing sections.

Testbed 2. The Crash Model (CRASH): The effect of a collision of a vehicle with a barrieris routinely done through a computer model implemented as a non-linear dynamic analysis codeusing a finite element representation of the vehicle. Proving ground tests with prototype vehiclesare ultimately made to meet mandated standards for crashworthiness. But, the computer modelsplay an integral part in the design of the vehicle to assure crashworthiness before manufacturingthe prototypes. How well the models perform is therefore crucial to the manufacturing process.

CRASH is implemented via a commercial code, LS-DYNA. Our focus is on the velocitychanges after impact at key positions on the vehicle, such as the driver seat and radiator.Details of the model and a typical set of inputs are in Appendix B. Geometric representationof the vehicle and the material properties play critical roles in the behaviour of the vehicle afterimpact and the necessary detailing of these inputs leads to very costly (in time) computer runs.Field data involve crashing of prototype vehicles and therefore costly in dollars. CRASH is theninherently data-limited, presenting a basic challenge to assessing the validity of the model.

Variables and sources of uncertainty in the vehicle manufacturing process and proving ground

procedures induce uncertainties in the test results. The acceleration and velocity histories of

two production vehicles of the same type, subjected to 30mph zero degree rigid barrier frontal

impact tests, as shown in Figure 15 demonstrate the differences in “replicate” crashes. There are

a variety of materials used in components of the vehicle and, consequently, a variety of material

properties to deal with, not all of which may be satisfactorily specified.

9

Page 10: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

2 Understanding the Model and Its Uses (Steps 1 and 2)

The beginning of the validation process is understanding the uncertainties associated with thecomputer model itself, and determining how the model is to be used.

2.1 Step 1. Specify model inputs and parameters with associated uncertainties

or ranges - the Input/Uncertainty (I/U) Map

Understanding what is known and not known about a computer model can be important in its evalu-ation. A convenient way to organize such information is through what we call the Input/Uncertaintymap. (This is related to the idea of a PIRT - see Pilch et al., 2001.) The map has four attributes:

a) A list of model features or inputs of potential importance

b) A ranking of the importance of each input

c) Uncertainties, either distributions or ranges of possible values, for each input

d) Current status of each input describing how the input is currently treated in the model.

The I/U map is dynamic: as information is acquired and the validation process proceeds, theattributes, especially b)-d), will change or be updated. This will become more evident followingSteps 4-6.

The inputs in the map are drawn from the development process. They will include parametersinherent to the scientific and engineering assumptions and mathematical implementation, and nu-merical parameters associated with the implementing code; in short, all the ingredients necessaryto make the model run. Because this list can be enormous, the more important parameters mustbe singled out to help structure the validation process by providing a sense, albeit imperfect, ofpriorities. We adopt a scale of 1-5 for ranking the inputs with 1 indicating only minor likely impacton prediction error and 5 indicating significant potential impact.

SPOT WELD: The purpose of the spot weld model is to investigate the process parametersfor welding aluminum. The I/U map of the model is in Table 1. The list of inputs in Table1 is more fully described in Appendix A. Initially, only three inputs have rank 5 based onthe model developer’s assessment. These three parameters (and gauge) are the focus of theinitial validation experiments; earlier experiments by the model developer led to the impactassessments appearing in the table. The controllable parameters, current, load, and gauge, willbe given ranges when the experiments are designed. In the current context, validation is withlaboratory data and “no uncertainty” is appropriate when current and load levels are set inthe laboratory. If, however, validation is required at the production level then uncertainties incurrent and load may be significant. In brief, the I/U map is context dependent.

There are several specific items connected with the I/U map in Table 1 that are worth

noting. First, the most significant specified uncertainty (impact factor 5) in the model elements

is that of the contact resistance. The model incorporates contact resistance through an equation

that, for the faying surface, has a multiplicative constant u about which it is only known that

10

Page 11: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

u lies in the interval [0.8, 8.0]. It will be necessary to tune this parameter of the model with

field data. The second most significant uncertainty in the model (impact factor 4) is the linear

approximation for stress/strain. The modeler was unable to specify the uncertainty regarding

this input, and so error in this element will simply have to enter into the overall unknown (and

to be estimated) bias of the model.

INPUT IMPACT UNCERTAINTY CURRENTSTATUS

Geometry electrodesymmetry-2d 3 unspecified fixed

cooling channel 1 unspecified fixedgauge unclear unspecified 1, 2mm

materials unclear Aluminum (2 types fixed× 2 surfaces)

Stress/ 4 unspecified fixedstrain piecewise linear (worse at high T)

C0, C1, σs 3 unspecified fixed1/σ = u · f ; f fixed 3 unspecified fixed by modeler

contact u = 0 tuned to dataresistance (electrode/sheet) 5 u ∈ [0.8, 8.0] for 1 metal

u =tuning (faying)thermal 2 unspecified fixed

conductivity κcurrent 5 no uncertainty controllable

load 5 no uncertainty controllablemass density (ρ) 1 unspecified fixedspecific heat (c) 1 unspecified fixed

mesh 1 unspecified convergence/speednumerical M/E coupling time 1 unspecified compromiseparameters boundary 1 unspecified

conditions fixedinitial conditions 1 unspecified fixed

Table 1: The I/U map for the spot weld model

Table 7 in Appendix B gives the corresponding I/U map for the crash model.Initial impact assessments will be based on experience to reflect a combined judgment of the

inherent sensitivity of the input (the extent to which small changes in the input would affect theoutput) and the range of uncertainty in the input. These will be revised through sensitivity analysesand ‘tuning with data’ that occur later in the process. Inputs about which we are “clueless” mightbe singled out for attention at some point along the validation path but the effect of “missing”inputs (i.e., non-modeled features) may never be quantifiable or only emerge after all effects of“present” inputs are accounted for.

In model validation, considerable attention is often paid to the issue of numerical accuracy ofthe implemented model – for instance, in assessing if numerical solvers and finite element (FEM)

11

Page 12: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

codes have ‘converged’ to the solution of the driving differential equations. This is an importantconsideration and, as detailed in Cafeo and Cavendish (2001), is an issue of model and code ver-ification. It should ideally be addressed early in the model development process and prior to thevalidation activity emphasized in this paper.

It is often the case, however, that convergence will not have been obtained; e.g., modelers maysimply use the finest mesh size that is computationally feasible, recognizing that this mesh sizeis not sufficient to have achieved convergence. The method we are describing for validation stillworks. The error introduced by a lack of convergence becomes part of the ‘bias’ of the model thatis to be assessed (see Section 3). The I/U map should, of course, clearly indicate the situationinvolving such convergence. This means that parameters such as grid size may be confounded withother assumptions about the model making it more difficult to improve the model. Ideally, thiscould be done through designed experiments, varying values of the numerical parameters in orderto assess numerical accuracy.

2.2 Step 2. Determine evaluation criteria

Evaluation of a model depends on the context in which it is used. Two key elements of evaluationcriteria are:

• Specification of an evaluation criterion (or criteria) defined on model output

• Specification of the domain of input variables over which evaluation is sought.

Even if only one evaluation criterion is initially considered, other evaluation criteria inevitablyemerge during the validation process. In fact, it is often desirable to have multiple outputs tocompare with reality to help assess the usefulness of the model. The overall performance of themodel may then depend on the outcomes of the validation process for several evaluation criteria –the model may fail for some and pass for others – leading ultimately to follow-on analyses aboutwhen and how the model should be used in prediction.

Informal evaluations are typical during the development process – does the computer modelproduce results that appear consistent with scientific and engineering intuition? Later in thevalidation process these informal evaluations may need to be quantified and incorporated in the“formal” process. Sensitivity analyses may, in some respects, be considered part of evaluation if,for example, the sensitivities confirm (or conflict with) scientific judgment. We defer discussion ofsensitivity to Section 10.1.

The evaluation criteria can introduce complexities that would need to be addressed at Steps 4- 6, but may also affect the choices made here. For example, an evaluation criterion that leads tocomparisons of curves or surfaces or images places greater demands on the analyst than simplerscalar comparisons.

Of necessity, the specifications must take into account the feasibility of collecting data, particu-larly field data, to carry out the validation. This can be further complicated by the need to calibrateor tune the model using the collected data; the tuning itself being driven by the specifications.

12

Page 13: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

SPOT WELD: Two evaluation criteria were initially posed:

I. Size of the nugget after 8-cycles

II. Size of the nugget as a function of the number of cycles

The first criterion is of interest because of the primary production use of the model; the secondas a possible aid in reducing the number of cycles to achieve a desired nugget size. Ideally theevaluation would be based directly on the strength of the weld, but weld diameter is taken asa surrogate because of the feasibility of collecting laboratory data on the latter. (Of course,if nugget size is not strongly correlated with weld strength, these criteria would probably beinappropriate.) In production, the spot welding process results in a multiple set of welds, butthe evaluation criterion considered here involves only a single weld. Criterion (II) was laterdiscarded as a result of the difficulty during data collection of getting reliable computer runsproducing output at earlier times than 8-cycles.

Specification of the feasible domains of the input variables is another aspect of formulatingthe evaluation criteria. For the spot weld model, these domains are:

– Material: Aluminum 5182-O and Aluminum 6111-T4

– Surface: treated or untreated

– Gauge (mm): 1 or 2

– Current (kA): 21 to 26 for 1mm aluminum; 24 to 29 for 2mm aluminum

– Load (kN): 4.0 to 5.3

Material and surface might enter the model through other input variables relating to properties

of materials. Our initial specification in Table 1 considers material, surface and gauge as fixed.

The tuning parameter, u, has the range indicated and is the only other input that is not fixed.

CRASH: For the first experiment the input consists solely of the impact velocity v. Thespecific output data to be analyzed is the velocity of the “Sensing and Diagnostic Module”,SDM, situated under the driver’s seat, relative to a free-flight dummy. This relative velocity isobtained by subtracting the impact velocity v from the actual SDM velocity (it being assumedthat the dummy maintains velocity v over the time interval of interest). The resulting functionsvary (at least theoretically) between 0 at the time of impact t = 0 and −v at the time the vehicleis stationary.

The evaluation criterion we consider is the SDM velocity calculated 30ms before the timethe SDM displacement (relative to the free-flight dummy), DISP, reaches 125mm. Call thisquantity CRITV. The airbag takes around 30ms to fully deploy, which is why this particu-lar evaluation criterion, CRITV, is important. Our analysis takes account of the dependencebetween displacement and velocity (displacement is the integral of velocity) by working withthe probability distribution of the velocity and then finding the implied distribution of thedisplacement.

The process we follow can be adapted to treat other evaluation criteria such as,

• Time at which SDM displacement reaches 125mm

• SDM velocity when SDM displacement reaches 250mm and 350mm

The evaluation criterion

• Velocity at the center of the radiator, RDC, 30ms before SDM displacement reaches 125mm

13

Page 14: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

poses different issues because it requires a combined analysis of the functional data from two

sensors, one located at the radiator center, the other under the driver’s seat.

3 Data Collection (Step 3)

Both computer and field (laboratory or production) experiments are part of the validation anddevelopment processes and produce data that are essential for

• Developing needed approximations to (expensive) numerical models

• Assessing bias and uncertainty in model predictions

• Studying sensitivity of a model to inputs

• Identifying suspect components of models

• Designing and collecting data that build on, and augment, existing, or historical, data.

The iterative and interactive nature of the validation and development processes will result inmultiple stages of computer experiments and even field experiments.

Typically, an effort is made to construct experiments that yield data over the ranges of what areconsidered to be the key input values. For low-dimensional input spaces, this can be done ratherinformally. For instance, in CRASH, the key inputs are the impact speed of the vehicle and thecollision barrier type. Table 2 exhibits the entire set of model inputs and measured field inputs forthe available data. The type of data resulting from each experiment is indicated in Figure 15 ofAppendix B.

When the input space is of larger dimension, it is preferable to use formal “space-filling” strate-gies of choosing the input values at which to experiment. For instance, in the spot weld test bedthere is one discrete and three continuous input variables of major importance, and covering the3-dimensional space with a limited number of runs (field or model) requires careful experimentaldesign. Among the most useful designs in such contexts is the Latin Hypercube Design (McKay,Conover and Beckman(1979)). We utilize code from W. Welch to produce such designs.

SPOT WELD: For the spot weld model there was limited model data about the tuning pa-rameter u. The initial computer experiment was therefore aimed at assessing the effect of u.The inputs to be varied are C = current, L = load, G = gauge, and u. The other inputswere held fixed. The cost – thirty minutes per computer run – is high so a limited number,26, of runs were planned for each of the two gauge sizes. The 26 runs for 1 mm metal coveredthe 3-dimensional rectangle, [20,27]x[3.8,5.5]x[1]x[1.0,7.0], in C, L, G, u space, while those forthe 2mm metal covered the 3-dimensional rectangle, [23,30]x[3.8,5.5]x[2]x[0.8,8.0]. The explicitdesign values obtained from the Welch code are in Table 3, along with the model output andthe corresponding laboratory data for the nugget diameter.

The computer runs exhibited some aberrant behavior. Many (17) runs failed to produce ameaningful outcome at cycle 8; these runs were eliminated. For reasons that are not yet clear

14

Page 15: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Impact velocity (km/h) barrier type Impact velocity (km/h)used in model of field tests

19.3 straight frontal 19.325.5 straight frontal 25.528.9 straight frontal 28.932.1 straight frontal 32.135.3 straight frontal 35.338.4 straight frontal 38.441.3 straight frontal 41.3, 41.349.3 straight frontal 49.4, 49.2, 49.4, 49.3, 49.3, 49.456.4 straight frontal 56.422.5 left angle 22.532.2 left angle 32.240.2 left angle 40.2, 41.4, 41.541.9 left angle 41.949.3 left angle 49.5, 49.256.2 left angle 56.257.3 left angle 57.328.9 right angle 28.931.9 right angle 31.941.7 right angle 41.7, 41.848.3 right angle 48.319.3 center pole 19.325.5 center pole 25.532.0 center pole 32.036.8 center pole 36.840.3 center pole 40.348.6 center pole 48.6

Table 2: Available input data

many runs were unable to produce reliable data for earlier cycle times; as a result evaluationcriteria depending on early cycle times were abandoned. The data retained (35 runs) are usedin the subsequent analyses.

To provide insight into the space-filling nature of the Latin Hypercube Design used for gauge=1in Table 3, the 2-dimensional projections of this design are shown in Figure 1. An important featureof such designs is that they exercise the code over a wide range of inputs and often unearth codedifficulties (for example, in Table 3 there were many failed runs for reasons not yet determined).Such designs are effective for a wide variety of purposes (sensitivity analyses, response surfaceapproximations to model output, predicting outcomes of the specified evaluation criteria). Incontexts where initial computer experimentation points to narrowing, or altering, the region forexploration specified in Step 2, new designs or augmentation of an initial design must be found.For extremely expensive model runs (or field runs), sequential designs might be considered, whereeach additional design point is chosen ‘optimally’ based on the information from previous runs.

Field data will usually be harder to obtain than computer experimental data. As a result,designing the field data will depend more crucially on the specifications in Section 2.2 and specific

15

Page 16: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Gauge u Load Current Nugget Gauge u Load Current NuggetDia. Dia.

(mm) (-) (kN) (kA) (mm) (mm) (-) (kN) (kA) (mm)1 6.52 4.072 26.44 – 2 4.544 3.936 27.76 7.151 4.60 4.684 21.68 5.64 2 5.696 4.14 25.52 6.391 3.64 5.024 23.64 – 2 1.088 4.684 28.32 6.381 7.00 4.412 23.36 – 2 0.8 4.276 24.40 4.871 6.76 4.888 25.04 – 2 3.68 4.412 26.08 6.471 1.00 4.82 22.52 4.36 2 4.832 4.616 23.00 6.681 3.40 4.616 27.00 – 2 7.136 4.344 27.20 6.711 5.32 4.48 20.84 6.12 2 4.256 5.228 24.68 6.541 2.92 5.092 20.56 5.00 2 3.392 4.004 23.28 5.971 1.48 5.364 21.12 4.53 2 1.952 4.48 23.84 5.721 2.20 4.004 21.40 5.20 2 2.528 3.8 24.96 6.231 2.68 4.344 25.88 – 2 2.24 4.208 29.72 –1 2.44 5.50 23.08 – 2 1.376 5.024 25.80 5.461 4.36 3.80 25.32 – 2 7.424 4.072 28.88 –1 1.24 4.208 24.76 6.06 2 6.272 4.548 29.16 7.361 6.04 4.752 20.00 – 2 6.848 5.364 23.56 –1 5.56 5.432 25.60 – 2 3.968 4.888 29.44 7.161 1.96 4.956 26.16 6.69 2 3.104 5.432 28.60 6.611 5.80 3.936 23.92 7.17 2 5.12 5.5 26.64 5.981 4.84 4.14 22.80 – 2 6.56 3.868 26.36 6.741 3.16 3.868 22.24 5.71 2 5.984 4.956 24.12 5.321 6.28 5.228 21.96 5.38 2 8 5.092 28.04 –1 1.72 4.548 24.20 5.85 2 2.816 4.82 26.92 6.701 5.08 5.16 26.72 – 2 5.408 5.16 30.00 –1 4.12 5.296 24.48 6.87 2 1.664 5.296 27.48 6.021 3.88 4.276 20.28 4.91 2 7.712 4.752 25.24 5.50

Table 3: Spot weld data from 52 model runs. Run failures indicated by –

methods cannot be stated in advance.In Sections 4, 5 and 7 we set down an informal description and assumptions for the computer

model data and field data. This includes consideration of calibration parameters, function outputand the treatment of the arguments of such functions. A more formal description can be found inBayarri et al. (2002).

4 Model Approximation (Step 4)

It is often of interest to see the effect of uncertainty in model inputs on model outputs. When thecode is cheap to run, then straight simulation (i.e., randomly generate input variables from theirdistributions and compute the corresponding model outputs) is a practical option for determiningthe output distribution. More refined methods relying on pseudorandom (e.g., Latin hypercube)generation of inputs can also be employed – at least when the number of input variables is modest– and somewhat extend the range of applicability of straight simulation. None of these techniques

16

Page 17: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

•••

••

••

• •

••• •

••

••

•• •

••

• •••

••

••

•• •

Figure 1: Latin Hypercube Design used for spot weld

are feasible, however, for expensive codes, and one must then resort to model approximations toobtain output distributions. Such approximations can also be useful in their own right, for at leastthe following reasons.

• It might not be feasible to directly employ the model ‘in the field’, whereas a fast approxi-mation to the model could be directly employed.

• It is often desired to perform an optimization over inputs. Common optimization algorithmscan be too expensive to implement with the computer model, but can be implemented withthe approximation (or at least the approximation can be used to significantly narrow therange of input values over which optimization with the computer model needs to be done).

• Finding optimal designs for additional model-development or validation experiments can re-quire a fast approximation to the computer model.

• In Step 5, we will make crucial use of model approximations in implementing the calibrationand validation methodology.

There are four basic techniques that can be useful in model approximation: (i) use of mod-els having lower resolution (e.g., larger mesh size) or including only significant basis elements(based on, e.g., Proper Orthogonal Decomposition or Principal Components methods); (ii) lin-earization/Gaussian error accumulation; (iii) response surface methodology, including Gaussianprocesses and neural networks; (iv) Bayesian networks, which allow uncertainty transference be-tween sub-models from which the model is constructed. The first technique is always an option,

17

Page 18: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

and can be combined with the other methods; of course, evaluation of the error introduced by usinga model of lower resolution (or with a smaller basis) can be difficult. The second technique, whichessentially linearizes the model so that (Gaussian) input distributions can be passed through themodel using linear Gaussian updating, is useful if it is feasible to work with the underlying code ofthe model and if linearization does not introduce severe bias. The use of Bayesian networks is notaddressed here.

A very useful general tool, for models whose output depends smoothly on inputs (very commonin engineering and scientific processes), is the response surface technique. (It should be noted that,even when the underlying process is not a smooth function of the inputs, one is often primarilyinterested in features of the output that are smooth.) The approach we recommend has beensuccessfully used when the number of input variables is less than 20 (typically requiring less than10 runs per input) and even as high as 40 (although then several hundreds of model runs may beneeded for accurate fitting). Below we briefly describe this technique. The particular technique werecommend meshes well with the validation analysis proposed in Step 5.

Notation: Denote model output by yM (x,u), where x is a vector of controllable inputs and uis a vector of unknown calibration and/or tuning parameters in the model. Sometimes we writez = (x,u).

In specific examples u may be absent. The goal is to approximate yM (x,u) by a function yM (x,u),to be called the model approximation, which is much easier to compute. In addition, it is desirableto have a variance function V M (x,u) that measures the accuracy of yM(x,u) as an estimate ofyM (x,u). A response surface approach that achieves both these goals is the Gaussian process re-sponse surface approximation (GASP), described in Sacks et al. (1989) and Kennedy and O’Hagan(2001); the approach is outlined below.

SPOT WELD: The vector of controllable inputs is x = (C, L, G), the tuning parameter isu. Use of GASP with the data from Table 3 leads to the response surface approximation toyM (C, L, G, u) that is exhibited in Figure 2. We do not explicitly show the variance function,but it is available. For instance, at (C, L, G, u) = (26, 5, 2, 4), the response surface approx-imation to yM (26, 5, 2, 4) is yM (26, 5, 2, 4) = 6.12, and the variance of the approximation isV M (26, 5, 2, 4) = 0.0046. At the values of the actual model data of Table 3, i.e. the solid dots inthe figures below, the response surface approximation is exact; it ‘passes through’ these points.The slight up-curve at the edges, for extreme values of u, occurs because the model data in thoseregions is very sparse and an overall mean level was used in the GASP analysis (as opposed to,say, a linear function). This has essentially no effect on ultimate predictions, since we will seethat the central values of u are those that are most relevant.

CRASH: The controllable inputs are x = (v, B), where v is the impact velocity and B is the

barrier type. There is no tuning parameter.

Let yM = (yM (x1,u1), . . . , yM (xm,um)) denote the vector of m evaluations of the model atthe inputs DM = {(xi,ui) : i = 1, . . . ,m}. The computer model is exercised only at the inputsDM , so that yM (z) = yM (x,u) is effectively unknown for other inputs z = (x,u). Thus, in

18

Page 19: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

••

• •

••••

•••

• • ••

•••

••

••

• ••

• • • • •

• ••

••••

Figure 2: GASP response surface approximation yM to yM for the spot weld model, constructedfrom the data in Table 3. The surfaces show estimated weld diameter, for the two gauge values, asa function of load and current for various values of the tuning parameter u. The solid dots denotemodel data yM (C,L,G, u) and are plotted on the surface corresponding to the closest value of u.

the Bayesian framework, we assign yM (z) a prior distribution, specifically, a stationary Gaussianprocess with mean and covariance functions governed by unknown parameters. (In application, wealways only deal with a finite set of zi, in which case the Gaussian process at these points reduces toa multivariate normal distribution.) Choice of the mean function is discussed below, but discussionof the covariance function is delayed until Appendix C.1.

The mean function of the Gaussian process will be assumed to be of the form Ψ(·)θL whereΨ(z) is a specified 1 × k vector function of the input z and θL is a k × 1 vector of unknownparameters. A constant mean (k = 1, Ψ(z) = 1, and θL = θ) is often satisfactory, if one plansonly to use the model approximation within the range of the available model-run data, but a morecomplicated mean function can be useful if the model approximation is to be used outside the rangeof the data. (When outside the range of the model-run data, the Gaussian process approximationto the model will gradually tend towards its estimated mean function, so that an accurate estimatedmean function will provide more accurate model approximations.) This can be especially importantwhen features such as seasonal trends are present. Formally, the mean function above does notallow the presence of a known constant (e.g., c + Ψ(·)θL), but this can be easily accommodatedby carrying out the analysis with the Gaussian process defined by subtracting c from the originalprocess.

A secondary benefit of introducing a mean function that is a reasonable approximation to themodel is that it will often result in smaller variances for the model approximations. If, however, a

19

Page 20: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

more complicated mean function is used but is not a more reasonable approximation to the model,it will result in larger variances, since it will contain more parameters that must be estimated.No firm guidelines are available as to whether a simple mean function or a carefully developedmean function are best. Our recommendation is to try to incorporate into the mean function anyobvious trends that exist in the model output but, again, even a constant mean function is oftensatisfactory.

CRASH: The computer model output corresponding to velocity, in a typical case, is indicated

in Figure 15 (the curve on the right). Such curves are clearly better modeled by a linear function

of time than a constant in time. Furthermore, we know the initial velocity v of the vehicle, so

use of the mean function v(1 − θLt) for the Gaussian process will clearly do a better job of

approximating the computer model than would a constant mean. (It should be emphasized,

however, that the methodology will typically provide accurate within-sample approximation

to the model output no matter what mean function is chosen for the Gaussian process.) We

actually follow common practice in this area and first transform the data by subtracting the

initial velocity, leading to what are called ‘relative velocity’ curves; for relative velocity curves,

the natural mean function would be −θLvt, corresponding to choosing k = 1 and Ψ(v, t) = − v t

in the above notation. Note that since the theoretical range of the relative velocity is from 0 (at

time t = 0) to −v (at time ts, when the vehicle reaches stationarity), θL can here be interpreted

as 1/ts.

For specified values of the parameters (such as θL) of the Gaussian process, the GASP behavesas a Kalman Filter, yielding a posterior mean function that can be used as the fast approximationto yM (·) together with a variance measuring the uncertainty in the approximation. (Details aregiven in Appendix C.1.) Note that this variance is zero at the design points at which the functionwas actually evaluated. The model approximation obtained through the GASP theory can thusroughly be thought of as an interpolator of the data, unless there is numerical instability in thecomputer model, as mentioned in footnote 2, in which case the approximation smoothes the data.

Unfortunately, the parameters (such as θL) of the Gaussian process are rarely, if ever, known.Two possibilities then arise:

a) Plug-in some estimates, for instance maximum likelihood estimates (as in the GASP softwareof W. Welch – see also Bayarri et al. (2002)), pretending they are the ‘true’ values.

b) Average over the posterior distribution of the parameters, in a full Bayesian analysis (asdescribed in Section 5).

The full Bayesian analysis is typically superior, in the sense that the resulting variance of themodel approximation will more accurately reflect reality, since the parameters are unknown. Interms of the actual model approximation yM (x,u), however, use of maximum likelihood estimatesof the parameters typically yield much the same answers as the full Bayesian analysis, and so maybe preferable in computationally intensive situations.

20

Page 21: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

The particular GASP approach that we use has the added bonus that certain types of stochasticinputs, z, can easily be handled within the same framework; see Appendix C.2 for details.

5 Analysis of Model Output (Step 5)

In this section, we describe the basics of the statistical modeling and analysis that are used formodel evaluation. For illustration in this section we use only SPOT WELD, since CRASH has afunctional data structure that we do not introduce until Section 7.

5.1 Notation and statistical modeling

The model is an approximation to reality. Another way of saying this is that the model is a biasedrepresentation of reality, and accounting for this bias is the central issue for model validation. Thereare (at least) three sources for this bias:

1. The science or engineering used to construct the model may be be incomplete.

2. The numerical implementation may introduce errors (e.g., may not have converged).

3. Any tuned parameters may be in error.

Furthermore, the model alone cannot provide evidence of bias. Either expert opinion or field datais necessary to assess bias – we focus on the latter.

Recall that we denote by yM (x,u) the model output when (x,u) is input. When u is notpresent, as in CRASH, we can statistically model “reality = model + bias” as

yR(x) = yM (x) + b(x) , (5.1)

where yR(x) is the value of the ‘real’ process at input x and b(x) is the unknown bias function,arising from the sources discussed above. When u is present we call its true (but unknown) valueu∗ and then model the bias via

yR(x) = yM (x,u∗) + b(x). (5.2)

Field data at inputs x1,x2, . . . ,xn are obtained, and modeled as

yF (xi) = yR(xi) + εFi , (5.3)

where the εFi are independent Normal random errors with mean zero and variance 1/λF . Note that

u is not an input in determining the field data. (We could have included u∗ in the definition ofyR and b, but that would have simply been extra notational burden.) These assumptions may onlybe reasonable after suitable transformations of the data and, in any case, more complicated errorstructures can be easily accommodated. For example, the εF

i can have a correlated error structure;indeed, this will be seen to be the case in dealing with CRASH.

21

Page 22: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

The assumption that εF has mean zero is formally the assumption that the field observationshave no bias. If the field observations do have bias, the situation is quite problematic, in thatpresumably the field experiments were designed so as to eliminate bias, yet failed to completely doso. If bias does exist in the field observations, there is no purely data-based way to separate the fieldbias from the model bias; expert opinion would typically be needed to make any such separation.Estimates of bias that arise from our methodology could still be interpreted as the systematicdifference between the computer model and field observations, but this is of little interest, in thatprediction of reality (not possibly biased field data) is the primary goal. Note that it is quitecommon for ‘existing field data’ to itself be biased (see, e.g. Roache, 1998), and obtaining unbiasedfield data is perhaps the most crucial aspect of model validation. See Trucano et al. (2002) forextensive discussion.

Assuming computation of yM is fast, Bayesian analysis now proceeds by specifying prior distri-butions for unknown elements of the model,

– the probability density p(u) for u, which we take to be that specified in the I/U map;

– a prior density p(λF ) for the precision (the inverse of the variance) of the field measurementerror – see Bayarri et al. (2002) for description of the prior we use;

– a prior density for the bias function b(x) (see Appendix D.1),

and utilizing Bayes theorem. (For full details on these priors see Bayarri et al. (2002)). Typically,however, yM is a slow computer model and we will then need to also incorporate the model approxi-mation from Section 4 into the Bayesian analysis, so that the model output yM is then viewed as theGaussian process discussed therein. (This will be necessary in both the SPOT WELD and CRASHtest beds, since the corresponding computer models are too expensive to run directly within theBayesian computation.)

5.2 Bayesian inferences

Section 5.3 discusses implementation of the Bayesian analysis. Here we focus on discussion of thepossible outputs of the analysis. The basic output from the Bayesian analysis is the posteriorprobability distribution of all unknown quantities, given the models and the data (model-runsand field). The key feature of the Bayesian approach is that this distribution incorporates alluncertainties in the problem, including uncertainties as specified in the I/U map and measurementerrors in the data. From this probability distribution, a variety of quantities of interest can becomputed and analyses made.

5.2.1 Calibration/tuning

Using field data to bring the model closer to reality, tuning, is often confused with calibration, theprocess by which unknown model parameters are estimated from data. The distinction is that,in calibration, one tries to find the true – but unknown – physical value of a parameter, while

22

Page 23: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

in tuning one simply tries to find the best fitting value. Calibration and tuning parameters aremathematically the same and are therefore treated identically in the analysis, but conceptuallythere is a potentially significant difference. Tuning will tend to give a better model for predictionwith inputs in the range of the field data, but may well give worse predictions outside this range.For this reason, it is not uncommon for modelers to limit the extent of tuning. This can be done, ifdesired, by simply restricting the allowed range of variation in the tuning parameter (or the spreadin the prior distribution of the tuning parameter) in the I/U map.

One often hears that data used for calibration/tuning cannot simultaneously be used for modelvalidation. However, Bayesian methodology does formally allow such simultaneous use of data.In part, this is because Bayesian analysis does not simply replace the parameter by some optimal‘tuned’ parameter value u, but rather utilizes its entire posterior distribution, which reflects theuncertainty that exists in the value of the parameter.

SPOT WELD: The vector of controllable inputs is x = (C, L, G) and the tuning parameter is u.(This could also be viewed as a calibration parameter, since it corresponds to an unknown featureof the contact resistance which is, in essence, being estimated from the field data.) As mentionedabove, the Bayesian analysis produces complete posterior distributions for the unknowns in themodel. For instance, Figure 3 gives the posterior density of u. The optimal tuned value of u isthe mean of this distribution, which is u = 3.96. Note that there is considerable uncertainty inthis value, and the Bayesian analysis will take this uncertainty into account in all assessments ofvariance and accuracy. Doing so also helps alleviate the type of over-tuning that can result if onewere to simply pick and use the best-fitting parameter value. In this regard it is also interestingto note that the two main modes of the posterior correspond to tuning on the gauge=1mm and2mm data separately; use of either data-set alone would likely have worsened the situation inregards to over-tuning.

Figure 3: The posterior distribution of the tuning parameter u.

23

Page 24: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

5.2.2 Predictions and bias estimates

Assume we want to predict the real process yR(x,u∗) at some (new) input x. The preferredapproach is to base this prediction on the output from a new model run at input x (Case 1).Sometimes this is not feasible (as when it is desired to produce a grid of predictions, as in Figure4), and then predictions must be based on use of the model approximation (Case 2). We describethe analysis separately for these two cases.

Case 1. Predictions utilizing a new model run: When using a new model run (a new pieceof data) for predicting the underlying process yR(x,u∗), we have at least two options. First, wecan simply obtain an estimate u and run the model at inputs (x, u) to obtain a prediction; this willbe called model prediction. The second and preferred approach is to use bias-corrected prediction,in which the model prediction is corrected by an estimate of the bias. The predictors, their bias,and their associated variances, are specified below (with full details given in Subsection 5.3).

Model prediction: The most commonly used predictor of yR(x,u∗) is yM(x, u), for someestimate u of the tuning parameter. (We recommend use of the posterior mean of u, but theargument applies to any estimate). The accuracy of this estimate is determined by its variance,Vu(x), which is one of the outputs of the Bayesian analysis.

It is often of separate interest to estimate the bias of the prediction yM (x, u). This is given by

bu(x) = yR(x) − yM (x, u) . (5.4)

The variance of this estimated bias is also available from the Bayesian analysis.

SPOT WELD: For G=2, L=4.888 and C=29.44, and using the posterior mean u = 3.96,

the pure model prediction, resulting from running the computer model at these inputs, is

yM (4.888, 29.44, 2, 3.96) = 7.16. The variance of this prediction is V3.96(4.888, 29.44, 2) = 0.628,

and the estimated bias of the prediction is b3.96(4.888, 29.44, 2) = 0.342.

Bias-corrected prediction: An important observation is that one can improve upon thepure model prediction yM (x, u). Indeed, since an estimate of the bias is available, it is clear that

yR(x) = yM (x, u) + bu(x) (5.5)

would be the optimal predictor of the actual process value yR(x). Furthermore, the variance ofthis improved prediction can be shown to be (Vu(x)− [bu(x)]2), which can be significantly smallerthan the variance, Vu(x), of the pure model prediction.

SPOT WELD: For G=2, L=4.888 and C=29.44, and using the posterior mean u = 3.96,

the bias-corrected prediction is yR(4.888, 29.44, 2) = 7.16 + 0.342 = 7.50, with a variance of

0.628− 0.3422 = 0.512. Bias-correction here has not resulted in a significantly reduced variance

24

Page 25: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

(compare with 0.628 for the pure model prediction), because the amount of bias was rather

modest at this input value. We will see in Figure 6 that the bias can be significantly greater (as

high as 1.0) at other input values.

There are several important subtleties in the above analysis. The first is that, in principle, asuperior model prediction could be obtained by ‘averaging’ yM (x,u), at the new input x, overthe posterior density of u. This cannot be done, however, if the model is expensive to run. Therecommended analysis in (5.5) achieves a compromise by utilizing the information from the newmodel run, yM (x, u), but also averaging yM (x,u) over other values of u through the fast modelapproximation. A related point is that the bias defined in (5.4) is different than that defined earlierin (5.1); this earlier bias was defined relative to the true (but unknown) value u∗, rather than theestimated value u. The Bayesian analysis properly accounts for this definitional difference in theanalysis.

Case 2. Approximate prediction, based solely on previous model runs: If it is not feasi-ble to evaluate yM (x, u) at the new input value x (for instance, if prediction is desired at many newinputs), one can still proceed with prediction of yR(x,u∗), using the model approximation yM (x,u).Indeed, since the model approximation is fast, ‘averaging’ the model approximation yM(x,u) overthe posterior density of u, is now feasible, which would lead to bias-corrected prediction. We alsoconsider pure model prediction, which is here given by yM(x, u), and the corresponding estimatedbias function, defined analogously to (5.4).

SPOT WELD: Using solely the previous model runs (and field data), and the model ap-proximation yM , Figure 4 gives the pure model predictions yM (L, C, G, u), the estimated biasfunctions bu(L, C, G), and the bias-corrected predictions yM (L, C, G, u) + bu(L, C, G), as dis-cussed above, for the spot weld model. For each gauge, these are presented as surfaces (as afunction of L and C), with the height again being the predicted weld nugget diameter.

Note that the information obtained by running the computer model to obtain yM(x, u) canconsiderably improve the prediction (and reduce the variance of the prediction), so the Case 1analysis should be done, when possible. This is particularly true when ‘local’ predictions are beingmade, such as predicting the effect of changing from input x to input x′, where x and x′ are close. Itwill often then be the case that the pure model prediction of the difference, yM (x, u)−yM (x′, u), isclose to the optimal bias-corrected prediction, yR(x′)− yR(x), and has much smaller variance thanif the same prediction were made based on the fast model approximation alone. The reason is thatthe bias, being smooth, essentially cancels when one computes the difference of model predictionsat close values of the input. The bias would also cancel in the analysis based on the fast modelapproximation, but the comparatively significant uncertainty in the fast model approximation (asan estimate of the actual computer model) will remain. On the other hand, for simply predictingthe process at a new input, the size of the bias correction will often be more significant than theuncertainty in the fast model approximation.

25

Page 26: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Figure 4: For a Case 2 analysis using only previous model runs, Left figures: the weld diameterpredictions yM (L,C,G, u) from the model approximation; Middle figures: the biases bu(L,C,G);Right figures: the bias-corrected predictions yM (L,C,G, u)+ bu(L,C,G). The circles represent thefield data that were utilized in this analysis.

This helps to explain the often-heard comment by modelers that, even when the overall modelpredictions are not particularly accurate, predictions of process changes arising from small changesin inputs often seem to be quite accurate. It also partly explains why statistical analysis of thefield data alone does not yield as useful predictions as analysis which incorporates the model infor-mation. For some global predictions, the statistical analysis alone might be nearly as good, but forexploring fine details of the process under study, the information from the computer model typi-cally dominates. This discussion also underscores the fundamental importance of using a methodof analysis that can accommodate, and properly weight, these different types of information.

5.2.3 Tolerance bounds

Predictive accuracy statements, such as “with probability 0.90, the prediction is within a specifiedtolerance τ of the true yR(x)” are obtainable from the Bayesian analysis and provide a single simplemeasure of the effectiveness of the computer model. These can be obtained both with, or without,running the model at the (new) input x (Cases 1 and 2, respectively) and correcting or not forbias. Recall that bias correction results in smaller variances.

Case 1. Obtaining a new model run at input x improves prediction and results in tighter tolerancebounds.

26

Page 27: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

SPOT WELD: For G=2, L=4.888 and C=29.44, and using the posterior mean u = 3.96, the

pure model prediction was yM (4.888, 29.44, 2, 3.96) = 7.16. The 5% and 95% percentiles of the

posterior distribution give the 90% tolerance bounds, which, in this case, are (6.02, 8.30). Sim-

ilarly, the bias-corrected prediction is 7.50, with associated 90% tolerance bounds (6.15, 8.30).

Case 2. If it is not feasible to obtain a new model run, or we have to give tolerance bounds formany new inputs (as when drawing a graph), then predictions, and tolerance bounds are basedonly on the previous model runs (and field data). Note that the resulting tolerance bounds willtypically be wider.

SPOT WELD: Figure 5 provides 90% tolerance bands for two typical cases, one of low load

(L = 4.0) and one of high load (L = 5.3), for each of the two gauges. In particular, the graphs

present the pure model and bias-corrected predictions, and the error bands are 90th percentile

bands for yM (x, u) and yR(x). Thus, for the top figures, there is a 90% probability for a specified

current, load, and gauge that the real nugget size lies between the upper and lower dotted lines;

the model approximation yM (x, u) at the optimal value of u = 3.96 (see Figure 2) is indicated

by the solid line. Note that the errors for the bias-corrected predictions (see the lower figures)

are considerably smaller.

5.2.4 Uncertainty decomposition

Bayesian analysis not only allows for incorporation of all uncertainties into the accuracy statements,but also enables decomposition of the uncertainty into its component parts. For instance, in theoverall model we use for SPOT WELD, there are three sources of error: uncertainty in the tuningparameter u, uncertainty in the bias function b(x), and uncertainty in the residual error εF (whichcan arise from random error in the field data and/or randomness inherent in the actual process).One can separately assess, and report, the variation inherent in each of these sources, which canbe important for determination of sensitivities and for improving the model. (Indeed, this aspectof the analysis can be considered to be a part of ‘sensitivity analysis’, as discussed in Section 10.1.)

Case 1. In this case, prediction is based on both the previous data (model runs and field data)as well as a new model run at the new input x at which prediction is desired.

SPOT WELD: For G=2, L=4.888 and C=29.44, and using the posterior mean u = 3.96, the

prediction was yM (4.888, 29.44, 2, 3.96) = 7.16. The three sources of error in this prediction

and their relative importance can be judged by decomposing the 90% tolerance interval into

intervals corresponding to each estimated quantity. The 90% interval for yM (4.888, 29.44, 2, u)

(with u being considered as the unknown and random quantity) is (6.50, 7.56); the interval

for b(4.888, 29.44, 2) is (−1.00, 1.24); and the additional variability of the interval, induced by

uncertainty in εF , is (−0.71, 0.71).

27

Page 28: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

••••

• •

•••

•••

•••••

••••

•••

•••• ••

•••

••

•••

•••••

• ••••

•••

••••

••

•••

••

•• •

••

••

•••

•••••

•••

• ••

••

••••

• •

•••

•••

••••• ••

••

•••

•••• ••

•••

••

•••

•••••

• ••••

••• ••

••

••

•••

••

•• •

••

••

•••

•••••

•••

• ••

••

Figure 5: A posterior summary of the error associated with predictions in the Case 2 scenario (i.e.,when only previous model runs are utilized). As a function of current for low (L = 4.0, left column)and high (L = 5.3, right column) loads, the top graphs show the model approximation yM (x, u)(solid line) and 90th percentile bands for the pure model predictions. The bottom row of the figurepresents the same for the bias-corrected predictions yR(x). The dots indicate the observed fielddata.

Case 2. If it is desired to graph the uncertainty due to each of the sources of error, as a functionof the inputs, it will typically be necessary to use the analysis based on only previous model runs.

SPOT WELD: The uncertainty associated with each of the unknown elements of the problem(u, b(x), and εF ) is presented in Figure 6. The top graphs present percentiles for yM (x, u),and indicates the effect of the uncertainty in u. The second and third graphs indicate thepercentiles for b(L, C, G) and for εF , respectively. Interestingly, all three sources of uncertaintycontribute comparable amounts (as measured by the width of the percentile bands) to the overalluncertainty. Clearly, ignoring any of these sources of uncertainty can lead to overconfidence inprediction. (Note that the constant lines corresponding to the residual error are a feature of themodel used; it was assumed that the residual error does not depend on x.)

5.3 Outline of the Bayesian methodology

We first consider the case in which the computer model is fast (so yM is treated as a known function,and no model approximation is needed). We recall the modeling assumptions from Section 5.1:

yF (x) = yR(x) + εF

yR(x) = yM(x,u) + b(x)

28

Page 29: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

••••

• •

•••

•••

••••• ••

••

•••

•••• ••

•••

••

•••

•••••

• •••••

••• ••

••

•••••

•••• •

••

••

•••

•••••

••••

• •••

••

•••

Figure 6: A posterior summary of the contributions of each source of uncertainty to the overalluncertainty of predictions under a Case 2 analysis. The top graphs show pointwise 90th percentilebands for yM (x, u) (with u being considered as the unknown and random quantity) as a functionof current for low (L = 4.0, left column) and high (L = 5.3, right column) loads. The middle rowof graphs shows 90th percentile bands for b(L,C,G). The bottom row shows 90th percentile bandsfor εF .

εF ∼ N(0, 1/λF ) .

These produce a multivariate normal density for the collection of all field data, yF , which weshall denote by f(yF | u, λF , b). (Strictly, we should write u∗ instead of u but, in the Bayesianapproach, all unknowns are considered to be random and so we will drop the * superscript fornotational simplicity.) The prior distribution of the unknown elements u, λF , b of the model will bedenoted by p(u, λF , b) and is described in Appendix D.2. Bayes theorem then yields the posteriordensity of these unknowns, given the data yF , as

p(u, λF , b | yF ) ∝ f(yF | u, λF , b)p(u, λF , b). (5.6)

To actually compute the posterior density, one would need to determine the normalizing constantthat makes the expression on the right hand side of (5.6) integrate to one. It will typically benecessary to deal with this posterior distribution by Markov chain Monte Carlo (MCMC) analysis(cf. Robert and Casella, 1999), however, and for this the normalizing constant is not needed. The

29

Page 30: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

result of the MCMC analysis will be, say, N draws from this posterior distribution of the unknownsu, λF and b. Call these samples ui, λ

Fi and bi, i = 1, . . . N . From these samples, the posterior

distribution of any quantities can be estimated. (Thus Figure 3 is just a smoothed histogramarising from the samples of the ui generated from the SPOT WELD posterior distribution.)

The estimate of the unknown u is now simply u, the average of the ui. (This is the estimatedposterior mean from the MCMC analysis.) Similarly, the estimated bias function is given by

b(x) =1N

N∑i=1

bi(x)

To predict the real process, yR(x), at any input x we have two options:

Model prediction: Here the prediction of the real process at input x is simply given byyM (x, u) . We recommend using the posterior mean as the estimate of u, but analysis can be donefor any estimate, such as the maximum likelihood estimate or any ad-hoc tuned estimate. Theestimated bias of this prediction is given from the MCMC by

bu(x) =1N

N∑i=1

[yM(x,ui) + bi(x)

]− yM (x, u) . (5.7)

The variance, Vu(x), associated with the model prediction yM (x, u), is computed as

Vu(x) = [bu(x)]2 +1N

N∑i=1

[yM (x,ui) + bi(x) − yR(x)

]2, (5.8)

where yR(x) is the bias-corrected prediction, which is computed as in (5.9). Note that it is nec-essary to use the MCMC computational analysis to obtain the estimated bias and variance of theprediction, so that there is no gain in efficiency of computation in using the pure model predictoryM (x, u).

The posterior probability that yM (x, u) is within a specified tolerance τ of the true yR(x) issimply estimated by the fraction of the samples (ui, bi) for which |yM (x, u)−[yM (x,ui)+bi(x)]| < τ .

Bias-corrected prediction: It is optimal to use the bias-corrected predictor, given by theMCMC estimate of the posterior mean of the true process at x, namely

yR(x) =1N

N∑i=1

[yM(x,ui) + bi(x)

]. (5.9)

An alternative expression for the estimate of the bias, bu(x), of the pure model prediction is

bu(x) = yR(x) − yM(x, u) , (5.10)

30

Page 31: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

thus making obvious its interpretation as the ‘bias’ of the predictor yM(x, u). The bias, bu(x), ofthe pure model predictor is, in general, different from the prediction of the bias funtion b(x). Thevariance of the optimal predictor yR(x) is simply computed as:

1N

N∑i=1

[yM (x,ui) + bi(x) − yR(x)]2 = Vu(x) − [bu(x)]2 . (5.11)

Note that for large bias, the reduction from the previous Vu(x) can be substantial.The posterior probability that yR(x) is within a specified tolerance τ of the true yR(x) is simply

estimated by the fraction of the samples (ui, bi) for which | yR(x) − [yM (x,ui) + bi(x)] | < τ .The difficulty with the above analysis is that it requires evaluation of yM (x,ui) at each gen-

erated value of ui (and also at each of the data inputs xi), which is infeasible when model runsare expensive. It is then necessary to use the Gaussian process approximation to yM , describedin Section 5, in order to carry out the computations. This (unavoidably) introduces additionaluncertainty into the predictions. The analysis, however, is very similar to the one just presented;further details are given in Appendix D.2.

6 Feedback; Feed Forward (Step 6)

The analyses in Step 4 and Step 5 will contribute to the dynamic process of improving the modeland updating the I/U map by identifying

• Model inputs whose uncertainties need to be reduced

• Needs (such as additional analyses and additional data) for closer examination of importantregions or parts of the model

• Flaws that require changes in the model

• Revisions to the evaluation criteria.

In SPOT WELD, for instance, the posterior distribution of u (Figure 3) will now replace theuncertainty entry in the I/U map. Another aspect of feedback is use of the Step 4 and Step 5analyses to further refine the validation process; e.g. to design additional validation experiments.

The feed-forward notion is to develop capability to predict the accuracy of new models that arerelated to models that have been studied, but for which no specific field data is available. This willbe done through utilization of hierarchical Bayesian techniques introduced in Section 8. In CRASH,for example, the the data available for centerpole impacts can be augmented through hierarchicalmodeling.

7 Functional Data

Often, data arises in functional form. For instance, in CRASH, the data arises as functions of time(see Figure 15). In SPOT WELD, the model-run data was given as a function of the number of weld

31

Page 32: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

cycles, but we ended up using only the output at 8 cycles in the analysis (because of data-qualityissues), so functional representation of the data was not needed.

We use t to denote the r-vector of arguments in functional data. In CRASH, t is time, a scalarquantity (i.e., r = 1). In the remainder of this section, we restrict attention to the scalar case,although the more general situation can be handled similarly. Also for simplicity of notation, weassume in this section that there are no u variables (true for CRASH, which will be the test bedapplication here).

We can now add t to the list of model inputs, and write the true process value at (x, t) asyR(x, t), the model output at (x, t) as yM (x, t), etc. As before, reality is linked to model outputby

yR(x, t) = yM (x, t) + b(x, t). (7.1)

In practice we cannot work with complete function data, and it is necessary to discretize thedata. One approach is simply to run separate analyses for each of a small set of t. This is notrecommended, unless it is only a small set of t that are of interest. (For example, in SPOT WELD,interest primarily focused on evaluation of model predictions at t = 8 cycles.) A second approachwould be to represent the functions that arise through a basis expansion (e.g., a polynomial ex-pansion), taking only a finite number of terms of the expansion to represent the function. Thecoefficients of the terms in this expansion would then be additional input variables in the analysis.This approach might well be optimal in certain settings, but we turn instead to the most directpossibility.

The most direct approach is to lump t with x and model yM and b with (single) Gaussianprocesses defined on the joint input space. Since we only consider discrete input values here,we further must pretend that we have only observed the function at a discrete set of points,DT = {t1, . . . , tT }. In essence, we are thus ‘throwing away available data.’ However, it is clearthat, if T is chosen large enough and the points at which we record the function are chosen well(see Section 10.2), then the function values at these T points will very well represent the function.While this keeps the dimension of the Gaussian processes reasonable (only one new input is added),the number of observations becomes much larger; at each input value x in the data set, there arenow T function evaluations that must be included as data. The total number of observationsthus becomes (m + l)T , where m, and l are the number of x points in the model and field data,respectively. Computational complexity grows rapidly with the size of the data set so, at first sight,this approach is untenable.

Luckily, if we choose the same set, DT , of t points for each of the x inputs in the model-run orfield data, and we make a reasonable simplifying assumption as to the nature of the Gaussian processcorrelations involving t, a considerable simplification is effected that reduces the computationalburden to something like the sum of the burdens for (m + l) and T data points. This is discussedin Appendix E.1.

This analysis now proceeds as in Sections 4 and 5, and produces all the estimates and tolerancebounds discussed therein. For instance, the GASP approximation, yM (x, t), of the computer model,

32

Page 33: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

yM (x, t), can be computed, and pointwise error bands given. Thus, for fixed x, the 80% pointwiseposterior error bands are calculated by choosing L1(t), L2(t) so that 80% of the MCMC samples liein IM = [yM (x, t) − L1(t), yM (x, t) + L2(t)]; the interpretation is then that “the probability is .80that the computer model output (if run) would lie within the interval IM .”

Inference for a specific evaluation criterion can also be made (see Appendix E.2). In the examplesof this section, the CRITV values arising from use of the GASP model approximation and bias-corrected prediction will be given.

Figure 7: 80% posterior intervals of SDM velocity for yM , arising from the GASP model approxi-mation, and for predicting yR, using bias-corrected prediction (in the Case 2 scenario in which onlyprevious model runs are used in the analysis).

CRASH: We restrict attention to evaluation when the impact velocity is 56.3 km/h. Theevaluation criterion, CRITV, is “SDM velocity calculated 30ms before SDM displacement, DISP,reaches 125mm.” We take DT to be the set of 19 time points t = 1, 3, . . . , 15, 17, 20, 25, . . . , 65ms.More points are chosen in the region t < 20ms since information from this region is moreimportant in estimating CRITV. More t points could be used at some computational expensebut this selection is adequate because SDM velocity is comparatively smooth and informationat times greater than 65ms is irrelevant for the context at hand.

Since, for any v, the relative velocity is 0 at the time of impact t = 0, and since the slopes areroughly proportional to v, we assume that the prior mean of yM has the form θLvt. We carryout a MCMC analysis to approximate the posterior distribution of the parameters (see Bayarriet al. (2002).) Pointwise posterior intervals are computed. Figure 7 shows 80% pointwiseposterior intervals for yM (56.3, ·) and yR(56.3, ·), corresponding to relative SDM velocity inthe range t < 80ms. The greatest uncertainty in these distributions occurs for t > 65ms, theregion where no data were observed. Note that the error bands about yM (56.3, ·) only reflect

33

Page 34: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

the uncertainty in the GASP model approximation to the model, while the error bands aboutyR(56.3, ·) incorporate all the uncertainty in prediction of reality.

Figure 8: 80% posterior intervals for SDM velocity bias, at 56.3km/h impact velocity.

A simple graphical way of judging the validity of the model is to study tolerance bands forthe estimated bias, as given in Figure 8. This shows a small predicted bias (ranging from 0 to2km/h, with mild uncertainty), slowly increasing as a function of t.

Figure 9 illustrates the uncertainty in the criterion of interest, CRITV . The lower figureshows the uncertainty of prediction of real CRITV , using the optimal bias-corrected estimate(Case 2). This would be the result of primary interest to the engineer. The mean and standarddeviation of this posterior distribution of CRITV are -5.21 and 0.33, respectively (so that -5.21would be the bias-corrected estimate).

To see how much of this uncertainty is due to the use of the GASP model approximationto the computer model, the upper figure presents the distribution of CRITV that arises fromthe uncertainty in the GASP approximation. The mean of -5.11 is similar to that for the realprediction (suggesting that there is minimal bias), and the standard deviation is 0.13, indicatingthat most of the uncertainty in the real CRITV prediction is due to sources other than theGASP model approximation. (The standard deviation 0.13 in part reflects the fact that onlyprevious model runs were used - i.e., the model was not re-run at the desired input v = 56.3,and in part reflects the uncertainty that arises from discretizing t; note that this part of theuncertainty could be eliminated with more computational effort.)

Since bias is a function of the impact velocity v, the bias should be examined at differentvalues of v. Figure 10 shows the bias for a 30km/h impact. The bias is clearly larger in the20-59ms interval than it was for the 56.3km/h impact. The mean and standard deviation ofCRITV are now (−6.53, 0.38) for yM (GASP estimation of the model) and (−6.56, 0.49) foryR (bias-corrected prediction of real CRITV ). It is interesting to note that the bias seen in

34

Page 35: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Figure 9: Posterior distributions for CRITV.

Figure 10 does not have a serious effect on the evaluation criterion (since both the GASP modelapproximation and the prediction of reality are very close). This serves as a potent reminderthat validity of a computer model can depend strongly on the evaluation criterion of interest;while there is a clear indication that the computer model does have bias for SDM velocity atlower impact velocities and larger times, this bias disappears if only the CRITV criterion is ofinterest.

8 Extrapolation Past the Range of the Data

One of main motivations for using computer models is the hope that they can adequately predictreality in regions outside the range of the available data. We have advocated use of bias-correction,based on field data to improve (typically biased) computer model predictions. The difficulty is thatthe estimates of bias may not extrapolate well outside the range of the actual field data. Whenthis is the case, the Bayesian methodology will tend to return very large tolerance bands; whileone is at least not then making misleading claims of accuracy, the large bands may make assertionof predictive validity of the model impossible. (On a technical note, the best way to minimize thesize of the tolerance bands in extrapolation is to choose the mean, Ψ(·)θL, of the model Gaussianprocess and the mean of the bias Gaussian process to be as accurate representations of the realprocess as possible.)

One ‘solution’ to this difficulty is to simply make the scientific judgement that the bias estimatesdo extrapolate. For instance, in CRASH, the entire analysis was performed for a fixed vehicleconfiguration. If, say, an element of the vehicle frame were increased in thickness by 5%, one might

35

Page 36: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Figure 10: 80% posterior intervals for SDM velocity bias, at 30km/h impact velocity.

reasonably judge that the bias estimates would extend to this domain, even though no field datawas obtained for varying thicknesses of the element. Of course, any such assumption should bereported, along with the conclusions of predictive accuracy of the model.

Bayesian methodology allows a weaker (and typically much more palatable) way of extrapolatingpast the range of the data. The idea is to model the new scenario as being related to that (or those)for which data is available, but not to insist that the situations are identical in terms of bias andpredictive accuracy. There are many variants on this theme; here we consider one that applies tothe CRASH model and is typically called hierarchical modeling.

Hierarchical modeling applies most directly to scenarios in which there are K different functionoutputs, each coming from different inputs to a computer model (or even from different computermodels). Each of these functions can be modeled as was done in Section 7, through Gaussianprocess priors. We will be particularly concerned with settings where the Gaussian processes foryM and b can be assumed to share common features, typically where the parameters governingthe priors are drawn from a common distribution. This induces connections among the individualmodels and enables us to combine information from the separate models, sharpen analyses andreduce uncertainties. Clearly, disparate computer models are unlikely to be usefully treated thisway but, for CRASH, such a hierarchical approach will be seen to be useful.

Implementation of these ideas will depend heavily on what data, both computer and field,are available as well as the legitimacy of the assumptions imposed. Here we informally state andcomment on these assumptions for the simplest structure we will impose. (Full details can be foundin Appendix F.)

36

Page 37: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Assumption 1. The smoothness of the model approximation processes are identical across theK models being considered. This is a very reasonable assumption in the contexts for whichhierarchical modeling would typically be employed.

Assumption 2. The variances of the model approximation processes are equal, across the variouscases. Similarly, we assume that the variances, 1/λF , of the field data for all K cases areequal. Again, this is typically reasonable.

Assumption 3. The relation among the means of the Gaussian processes for the K computermodels is quantified by assuming a common prior distribution for the θL

i as specified inAppendix F. This prior distribution will allow the θL

i to vary considerably between the K

situations, but still ensures that information is appropriately pooled in their estimation.

Assumption 4. The biases for the K situations are assumed to be related in a fashion describedby a parameter q, whose value must be specified. This parameter describes the believeddegree of similarity in the biases for the K different computer models; indeed, 1 + q canbe interpreted as an upper bound on the believed ratio of the standard deviations of thebiases, or, stated another way, the proportional variation in the bias is q. (See Appendix Ffor details.) Specifying q = 0.1 is stating that the biases are expected to vary by about 10%among the various cases being considered.

Note that specification of q is a judgement as to the comparative accuracy of the K differentcomputer models, as opposed to their absolute accuracy (which need not be specified). The reasonwe require specification of this parameter by the engineer/scientist is that there is typically verylittle information about this parameter in the data (unless K is large). Note that specifying q tobe zero could be reasonable, if one is unsure as to the accuracy of the computer models but is quitesure that the accuracies are the same across the various K.

CRASH: The analysis performed earlier was on data and model for rigid barrier, straightfrontal impact. By use of hierarchical modeling we can simultaneously treat rigid barrier, leftangle and right angle impacts as well as center pole impact. The analyses and predictions aremade for a 56.3km/h impact (this is at the high end of the data). For illustration, we chooseq = 0.1.

Figure 11 shows the differing posterior predicted SDM velocity curves and pointwise uncer-tainty bands for each of the four barrier types. The straight frontal and left angle posteriorintervals in Figure 11 are tight because there are data close to 56.3km/h for these barrier types(so the analysis is effectively like a Case 1 analysis – i.e., based on a new model run at thedesired input – than a Case 2 analysis). In contrast, the intervals are not tight for the otherbarriers because data near 56.3km/h are lacking. This thus reinforces the value of making amodel run at a new desired input. Figure 7 and the straight frontal pictures in Figure 11 arevery similar.

Figure 12 gives the estimates of the four bias functions and the associated pointwise uncer-tainties. Because of the large uncertainties in the bias estimates, the only case in which the biasseems clearly different from zero is for left angle impacts, after 43ms. (While we cannot clearly

37

Page 38: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

assert that there is bias in the other cases, the tolerance bounds for predictions will be quitelarge, reflecting the uncertainty in the bias estimates.)

We again consider the criterion CRITV = SDM velocity 30ms before SDM displacementis 125mm. Table 4 presents the mean and standard deviations of CRITV for each barriertype for the GASP model approximation and for the bias-adjusted prediction of CRITV . Thecorresponding posterior distributions for CRITV are available, but omitted here.

Figure 11: Pointwise 80% posterior intervals for 4 barrier types, based on the hierarchical model,in the Case 2 scenario in which only previous model runs are utilized.

Angle as an additional input: If we only consider the three rigid barrier impacts (frontal, rightangle and left angle) and ignore the center pole impact and data, we could proceed without useof hierarchical modeling by incorporating the angle of impact, xA, as an input to the model. Thesmoothness assumption required for the Gaussian process analysis is plausible: it is reasonableto assume that small changes in the angle will result in small changes in the velocity-time curveso that yM is a smooth function of xA.

Combining the data from left angle (xA = 0.0), right angle (xA = 1.0), and straight frontal(xA = 0.5) barrier impacts led to computations that were considerably more expensive than inthe hierarchical model because we now need to invert 26 × 26 matrices instead of the smaller

38

Page 39: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Figure 12: Pointwise 80% posterior intervals for bias, based on the hierarchical model, in the Case2 scenario where only previous model runs are utilized.

Hierarchical model Using frontal data onlyBarrier type CRITV for yM CRITV for yR CRITV for yM CRITV for yR

left angle -6.08 (0.34) -6.34 (0.49)straight frontal -5.13 (0.13) -5.22 (0.30) -5.11 (0.13) -5.21 (0.33)

right angle -6.89 (0.65) -6.80 (0.96)center pole -6.55 (0.74) -6.54 (0.91)

Table 4: Posterior mean and standard deviation of CRITV , arising from the GASP model approx-imation estimate of yM , and arising as the bias-corrected prediction (yR) of real CRITV (in theCase 2 scenario where only previous model runs are utilized).

matrices encountered in dealing with the individual barrier types.Comparison of the results in Tables 5 and 4 generally shows close agreement between using

angle as input and the hierarchical model. There are differences associated with the right angleCRITV, reflecting the paucity of data for that angle. (The hierarchical model makes weakerassumptions about the relationship between the various cases than does incorporation of angleas an input variable, and hence is more affected by the shortage of data for right angle.)

The tolerances in Table 6 refer to relative velocities, but ranges for other evaluation criteriacan be easily computed. For example, the tolerance range for “time at which SDM displacementis 125 mm” corresponding to the frontal analysis is 15.79 ± 1.03ms.

39

Page 40: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Figure 13: 80% pointwise posterior intervals for SDM velocity, using angle as additional input, inthe Case 2 scenario of using only previous model runs.

9 Merging Predictive and Physical Approaches to Validation

9.1 The probability that the computer model is correct

In the introduction, the two philosophies towards validation of a computer model were discussed.We have focused on the predictive approach, that of determining the accuracy of the predictionsof the computer model, assuming that some bias exists. In the physical school, a modeler that hascarefully constructed and exhaustively tested each component of a model (including componentinterfaces) might argue that the model is correct by construction, i.e., that there can be no bias.Such claims should be supported by at least some confirming data, but how much confirming datais needed?

This same question arises in the pure view of the scientific process. A scientist proposes a newtheory, which makes precise predictions of a process, say yM (x)±0.0001 at input x. Other scientiststhen try to devise an experiment that can test this theory, i.e., an experiment that, for some inputx∗, will provide a field observation yF (x∗) that is within, say, 0.00001 of the true process value

40

Page 41: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Barrier type CRITV for yM CRITV for yR

left angle -6.04 (0.36) -6.29 (0.53)straight frontal -5.12 (0.15) -5.24 (0.30)

right angle -6.88 (0.64) -6.69 (0.90)

Table 5: Posterior mean and standard deviation of CRITV , arising from the GASP model approx-imation estimate of yM , and arising as the bias-corrected prediction (yR) of real CRITV (in theCase 2 scenario where only previous model runs are utilized).

Barrier type Hierarchical model Angle input xA Frontal data onlyleft angle -6.08 ± 0.70 -6.04 ± 0.76

straight frontal -5.13 ± 0.40 -5.12 ± 0.41 -5.11 ± 0.43right angle -6.89 ± 1.27 -6.88 ± 1.17center pole -6.55 ± 1.18

Table 6: Posterior mean and 80% tolerance range for CRITV , arising from the GASP modelapproximation estimate of yM .

yR(x∗). If the experiment is conducted and yF (x∗) is within ±0.0001 of yM (x∗), then the scientifictheory is viewed as being validated. The key to this scientific process is that, if the scientist makeseven one very precise prediction, and the prediction turns out to be true, then that would seem tobe considerable evidence in favor of the hypothesis. If the prediction of the scientist were not veryprecise, then a single observation could disprove, but not really confirm the theory.

The natural language in which to discuss and implement these ideas is the Bayesian language.The proposed new theory (or proposed computer model) is M0, and one asks the question (afterseeing one or more field observations) “What is the probability, given the data, that M0 is correct?”This question can be asked – and answered – through the Bayesian approach, and the result behavesas the scientific intuition from the previous paragraph would suggest. In particular, this probabilitycan be quite high in the scientific context of a precise theory, after even one confirming precise fieldobservation, while it will not be high in the case of an imprecise theory (or an imprecise fieldobservation).

This is a problem of hypothesis testing or model selection. In the classical approach to hypoth-esis testing, one can essentially show that M0 is false, if the data so suggest, but it is much harderto show that M0 is true. The Bayesian approach does allow direct answer of this primary questionof interest.

Several ingredients are needed to implement the Bayesian approach.

1. A prior probability, π0, that M0 is true. This can, of course, vary from one individual toanother. The modeler might feel π0 to be quite high, while a skeptic might judge it to below. Often, however, the default choice π0 = 1/2 is made, in order to ‘see what the data hasto say.’

41

Page 42: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

2. An alternative model M1.

3. Suitable prior probability distributions on unknown parameters of M0 and M1.

In the context of evaluation of computer models, we have already constructed these needed ingre-dients. In particular,

• The prior distribution on the parameters of the computer model, M0, is that provided bythe I/U map.

• The alternative model, M1, is the model we constructed in Sections 5.1 and 7 , which includesthe bias term b(x). Full details can be found in Bayarri et al. (2002).

• The prior distribution on the parameters of the alternative model (including the unknownbias) are as constructed for the predictive validation.

The result of the analysis is called the posterior probability that M0 is true, and will be denotedby P (M0 | y), where here we generically let y refer to all the data.

CRASH: Analysis in this test-bed resulted in a posterior probability of near 0 that the computer

model is true (assuming an initial prior probability of π0 = 1/2. This was actually also apparent

from earlier graphs of the estimated bias, and illustrates an important point: if a computer model

has statistically significant bias over any part of the domain under study, the model will have

essentially zero posterior probability of being correct. This, of course, is as it should be, but

does point out the reason that looking at predictive accuracy of the model (which can vary over

the input domain) is greatly superior to simply asking yes/no questions.

Before discussing the details of the analysis, another feature of the Bayesian approach deserveshighlighting, namely that the conclusions regarding accuracy of predictions will now be a weightedaverage of the accuracy statements arising from M0 and M1. For instance, if it is desired to knowthe probability that yR(x∗), at specified input x∗, lies within the interval (8, 10), the answer wouldbe

P (M0 | y)P (8 < yR(x∗) < 10 | M0) + (1 − P (M0 | y))P (8 < yR(x∗) < 10 | M1) ,

with the P (8 < yR(x∗) < 10 | Mi) being computable from our previous analyses. In this expression,we see a complete merging of the physical and predictive approaches to model validation. Thephysical approach would produce the accuracy statement P (8 < yR(x∗) < 10 | M0), while thepredictive approach would produce the accuracy statement P (8 < yR(x∗) < 10 | M1). The overallcorrect answer is their weighted average, with the weights being the posterior probability that eachof the models is true.

9.2 Implementation

In carrying out the Bayesian computation of P (M0 | y) for a slow computer model, the approxi-mation introduced in Section 4 will be required. In this case, M0 is like the overall model M1, butwith the bias function b(·) = 0.

42

Page 43: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Let φi be the full parameter vector for model Mi, i = 0, 1 (including all parameters of the meanfunctions and the Gaussian processes involved). In addition, for model Mi, i = 0, 1, denote by fi(y |φi), pi(φi) and pi(φi | y) the likelihood function of the full data vector y (both computer model andfield data), the prior density and the posterior density for the parameter vector, respectively. Theform of the likelihood function and the approaches for prior specification and posterior inference,using MCMC methods, for model M0 are similar to the corresponding ones for model M1, describedearlier and detailed in Bayarri et al. (2002).

Letting π1 = 1−π0 denote the prior probability of M1, Bayes theorem gives that the posteriorprobability of M0 is given by

P (M0 | y) =π0m0(y)

π0m0(y) + π1m1(y), (9.1)

wheremi(y) =

∫fi(y | φi)pi(φi)dφi (9.2)

is the marginal likelihood for model Mi, i = 0, 1.Although we are typically able to integrate over a part of the parameter vector φi, analytic

expressions for the integrals in (9.2) are not available. However, numerical evaluation, based onMonte Carlo estimates, is feasible using the posterior samples from pi(φi | y), i = 0, 1, and theexisting approaches to this problem (see, e.g., Chib and Jeliazkov, 2001, and references therein).

Note that use of improper priors is typically not possible when interest lies in computation ofmarginal likelihoods of models. However, given specific structure of the models, improper priors forsome of the parameters can be employed. (See Berger, De Oliveira and Sanso, 2001, for discussionof this issue and additional references.) In our setting, it is, in general, possible to use the improperprior, given in Bayarri et al. (2002), for θL, the vector of parameters associated with the meanfunction of yM(·).

9.3 Merging numerical and statistical modeling

When there is a significant amount of field data available, a statistician might consider simplymodeling the field data by statistical methods, forgoing utilization of the computer model of theprocess. It is natural to ask if such pure statistical modeling can be merged with the computermodel to produce improved predictions. The answer is yes, and involves incorporation of thestatistical modeling into both the mean function of the bias and error structure of the data. Wedo not consider this further here, as the focus of the paper is on validation of the computer model.

43

Page 44: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

10 Additional Issues

10.1 Computer model simplification

Sensitivity analysis

Sensitivity analyses focus on ascertaining which inputs most strongly affect outputs, a key tool inrefining the I/U map. There are ‘local’ and ‘global’ approaches to sensitivity analysis. The localapproach is based on derivatives of model outputs with respect to model inputs. This can some-times be done by automatic differentiation (actual line-by-line differentiation within the computercode), but is almost always a difficult process. The global approach is based on statistical analysiscomparing the output and input distributions. There are many versions of such global analyses, butthe most commonly used are variants of ‘analysis of variance’ (ANOVA) decomposition, to assesswhich input variables have the greatest effect on the variance of the output distributions. Modelsbased on the most important variables can then be studied, with the less important variables fixed(at, say, their prior means). Methods of model simplification based on principal component analysis(or POD in the applied mathematics literature) also fall in this domain. (See Saltelli et al., 2000,for discussion and examples of output and sensitivity analyses.)

Again, model approximations are needed for expensive codes and these, in turn, require specialmethods of global sensitivity analysis. Elaboration of these methods will not be treated here,although there is code by W. Welch which provides an ANOVA decomposition for the modelapproximation mentioned in Step 4 based on maximum likelihood estimation of the parameters.

10.2 Utilization of transformations

• Often a transformation of the data is helpful in the statistical modeling. For instance, inCRASH, it was helpful to consider relative velocity (subtracting the initial impact velocityfrom all data) instead of raw velocity.

• One can perform a change in time scale to deal with nonstationarity in time. For instance,define a new time scale by taking a functional output yM (x0, t), at a typical x0, and defining

t∗ =∫ t

0

∣∣∣∣ ∂

∂vyM (x0, v)

∣∣∣∣ dv.

A similar rescaling could be done for any continuous variable, if desired.

• Transformations of the Gaussian process y(z) can be made, such as y∗(z) = g(z)y(z). This isa new Gaussian process with mean multiplied by g(·) and a covariance function that is oftenof suitable form.

CRASH: The process is ‘tied down’ at time 0, and a smaller variance is expected there.

Choosing, say, g(t) = t/(10 + t) will result in a Gaussian process with near zero variance

initially, yet a process that behaves essentially like the previous stationary process once one

is significantly far from 0.

44

Page 45: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

10.3 Modularization

The basic idea is to first do the Bayesian analysis of all the model data, ignoring the contributionof the field data in estimating GASP model approximation parameters (including the θL and thoserelating to the functional parameters t). Then, treating the model parameters (other than tuningparameters) as specified by the resulting posterior distribution (or, possibly, by their maximum like-lihood estimates), one incorporates the field data by a separate Bayesian analysis. The motivationand advantages of the modular approach are as follows.

1. This is a Bayesian version of ‘partial likelihoods’: if f(data|θ, ν) = f(data|θ)g(data|ν, θ),partial likelihood uses only f to estimate θ which then gets plugged into g for further inferenceabout ν.

2. Field data can affect the GASP model approximation parameters in undesirable ways, result-ing in pushing u to ‘bad’ regions of its space. The modular approach can prevent this fromhappening.

3. This easily generalizes to systems with several model components, Mi, each of which hasseparate model-run data. Dealing first with the separate model-run data, in setting up theGASP model approximations, and incorporating the field data only at the tuning/validationstage, makes for an easier-to-understand and computationally more efficient process.

Details concerning the modular approach can be found in Bayarri et al. (2002).

10.4 Multivariate output functions

There are a variety of possible ways to extend the analysis to multivariate output, important forsituations such as the following, but we do not consider them here.

CRASH: The evaluation criterion “velocity at the center of the radiator, RDC, 30ms before

SDM displacement reaches 125mm” involves a combination of two sensors, one located at the

radiator center and the other under the driver’s seat. This evaluation criterion thus requires an

analysis of bivariate model output.

10.5 Updating

The model approximation is exact only at the observed model-run data points. Sometimes thevalues of the model output are also constrained at other points, and it can be important to includesuch constraints in the analysis. This is best done ‘after the fact’ by conditioning the unconstrainedanswer on the known constraints. (Trying to incorporate the constraints at the beginning oftenfatally disrupts the computations.)

CRASH: A problem arises if we wish to predict the velocity-time curve when the initial velocity

v0 is between two data curves: at time t = 0 we know that the relative velocity is 0, but the

45

Page 46: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Gaussian process approximation only assumes that this is true in mean. If one tried to add

the constraint that all initial relative velocities were zero, the Kronecker product computational

simplification no longer applies, resulting in an impractical MCMC algorithm. A compromise is

to introduce the information that the relative velocity is initially 0 only at the prediction stage,

which is straightforward to do.

Another crucial instance of the conditioning idea is when an additional model data point,yM (x, u), becomes available. Indeed, this is how the actual model can be utilized in future pre-dictions, according to the recommended ‘Case 1’ approach to validation. The difficulty is thatone can then rarely go back and re-run the entire MCMC computation with this new data point.The solution is simply to condition the existing posterior on this additional data point, using itin the Kalman filter part of the analysis, but not to obtain the posterior for tuning parameters orparameters in the Gaussian processes. Details can be found in Bayarri et al. (2002).

10.6 Accounting for numerical instability and stochastic inputs

Sometimes numerical instability in the computer model can be modeled by adding an additionalrandom noise term to the GASP model approximation (often called a ‘nugget’ in the Gaussian pro-cess literature). Likewise, stochastic inputs can often be handled by simply enriching the stochasticstructure of the GASP model approximation. We do not consider these generalizations here.

REFERENCES

Bayarri, M. J., Berger, J. O., Higdon, D., Kennedy, M. C., Kottas, A.,Paulo, R., Sacks, J., Cafeo,J. A., Cavendish, J., Lin, C. H., and Tu, J. (2002). A Framework for Validation of ComputerModels. Technical Report # 128, National Institute of Statistical Sciences.

Berger, J.O, De Oliveira, V. and Sanso, B. (2001). Objective Bayesian Analysis of SpatiallyCorrelated Data. Journal of the American Statistical Association, 396, 1361–1374.

Cafeo, J.A. and Cavendish, J.C. (2001). A Framework For Verification And Validation Of Com-puter Models and Simulations. Internal General Motors document; to be published.

Chib, S. and Jeliazkov, I. (2001). Marginal Likelihood From the Metropolis-Hastings Output.Journal of the American Statistical Association, 96, 270–281.

Easterling, R. G. (2001). Measuring the Predictive Capability of Computational Models: Princi-ples and Methods, Issues and Illustrations. Sandia National Laboratories Report SAND2001-0243, February, 2001.

Kennedy, M. C. and O’Hagan, A. (2001). Bayesian Calibration of Computer Models (with dis-cussion). JRSS B 63, 425–464.

46

Page 47: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

McKay, M. D., Conover, W. J. and Beckman, R. J. (1979). A Comparison of Three Methodsfor Selecting Values of Input Variables in the Analysis of Output From a Computer Code.Technometrics, 21, 239–245.

Oberkampf, W.L., and Trucano, T. (2000). Validation Methodology in Computational FluidDynamics. AIAA, 2000-2549.

Pilch, M., Trucano, T., Moya, J. L., Froehlich, G. Hodges, A., and Peercy, D. (2001). Guidelinesfor Sandia ASCI Verification and Validation Plans - Content and Format: Version 2.0. SandiaNational Laboratories Report SAND2000-3101, January, 2001.

Roache, P.J. (1998). Verification and Validation in Computational Science and Engineering. Her-mosa Publishers, Albuquerque.

Robert, C.P. and Casella, G. (1999). Monte Carlo Statistical Methods. Springer- Verlag, NewYork.

Sacks, J., Welch, W., Mitchell, T. J., and Wynn, H. P. (1989). Design and Analysis of ComputerExperiments. Statistical Science, 4, 409-435.

Saltelli, A., Scott, M., and Chan, K., eds. (2000). Sensitivity Analysis. Wiley, Chichester.

Searle, S. R. (1982). Matrix Algebra Useful for Statistics. Wiley, New York.

Trucano, T., Pilch, M., and Oberkampf, W. O. (2002). General Concepts for ExperimentalValidation of ASCI Code Applications. Sandia National Laboratories Report SAND 2002-0341, March, 2002.

Wang, P.C. and Hayden, D.B. (1999). Computational Modeling of Resistance Spot Welding ofAluminum. Research Report R&D - 9152, GM Research & Development Center.

A Resistance Spot Weld Process Model

A.1 Introduction

In resistance spot welding, two metal sheets are compressed by water-cooled copper electrodes, un-der an applied load, L. Figure 14 is a simplified representation of the spot weld process, illustratingsome of the essential features for producing a weld. A direct current of magnitude C is suppliedto the sheets via the two electrodes to create concentrated and localized heating at the interfacewhere the two sheets have been pressed together by the applied load (the so-called faying surface).

A.2 The welding process

The welding process is comprised of six steps:

47

Page 48: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Figure 14: Resistance spot welding process

1. A load, L, is first applied to the electrodes producing an elastic and plastic deformation of thesheets. The resulting deformation brings into contact different areas at the electrode- sheetand faying surface, the size of which depends on the magnitude of the applied load, the yield

stress, σS, of the sheet metal and Young’s modulus, E, of the sheet and electrodes.

2. After the compression step, imposing a voltage drop across the electrodes generates a currentof magnitude C. The current passes through the electrodes, sheets and faying surface. Boththe electrodes and the sheets have well-defined temperature-dependent values of electricaland thermal conductivity, σ and κ, respectively.

3. Because of the current flow, heat will be generated and temperatures in the system will startto increase at a rate that depends on the local value of the resistance. The resistance offeredat the faying surface is particularly critical in determining the magnitude of heat generated.If the applied load is too high, the two sheets will be pressed tightly together at the fayingsurface, producing little resistance and low generation of internal heat. This inhibits meltingand nugget formation and growth. If the load is too low, then the air gap between the twosheets at the faying surface will be high producing a high resistance, high heating and possibleuncontrolled nugget growth (expulsion and electrode degradation).

4. The physical properties of the materials will change locally as a consequence of the localincrease in temperature. Young’s modulus and the yield stress of the sheet will fall (that is,the metal will “soften”) resulting in more deformation and increase in the size of the fayingcontact surface. At the same time, the electrical and thermal conductivities will decrease

48

Page 49: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

as the temperature rises; all of which will affect the rate of heat generation and removal byconduction away from the faying surface.

5. If the applied current is high enough, sufficient heat will be generated to result in melting, firstat the faying surface and then in an increasing volume of material about the faying surface.If the melt zone becomes too large, weld metal expulsion will occur.

6. When the current is turned off and the melt zone allowed to cool and quench, a nugget isformed and a spot weld results.

The thermal/electrical/mechanical physics of the spot weld process outlined above is modeledby a coupling of the continuum partial differential equations (PDE’s) that govern heat and electricalconduction with those that govern temperature-dependent, elastic/plastic mechanical deformation(Wang and Hayden, 1999).

A.3 The computer models

Finite element implementations are used to provide a computerized model of the electro-thermalconceptual model. Similarly, a finite element implementation is made for the equilibrium andconstitutive equations that comprise the conceptual model of mechanical/thermal deformation.These two computer models are implemented using two distinct modules of a commercial code(ANSYS).

Although the commercial finite element modules are distinct, they are coupled because themechanical deformation affects the electro-thermal conduction process through its effect on theareas of the contacting surfaces. This is simulated in the computer model by passing the calculatedtemperature field to the deformation module, called as an external procedure, at intervals of aquarter of a cycle (1/240 seconds of simulated time). The updated contact areas are then passedback to the electro-thermal module from the deformation module.

B Modeling for Vehicle Crashworthiness

Modeling the effects of a collision of a vehicle with a barrier is routinely done by implementing adynamic analysis code using a finite element representation of a vehicle. A finite element modelincludes the following components: complete body “in white” including windshield, cradle, bumpersystem, doors, engine/transmission, suspension, exhaust system, rear axle, drive shaft, radiator,steering system, instrument panel beam, and brake booster. Additional mass is often used torepresent nonstructural components and inessential objects, while maintaining the actual vehicletest weight and its center of gravity. The element size is generally between 10 mm to 15 mm. Holessmaller than 15 mm diameter are not modeled unless located in critical areas; fillets, rounds, andradii less than 10 mm are ignored.

A finite element vehicle model consists mostly of shell and solid elements. Shell elements areused to model the rail, frame, and stamped/deep-drawn sheet panels; solid elements are used to

49

Page 50: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

model the bumper foam, radiator, battery, and the suspension system. An engine is usually modeledwith shell elements on the exterior surface with properly assigned mass and moments of inertia torepresent the massive engine block. Since the crash behavior of the vehicle is the primary concern,there is greater detail in the crush zone structure/components and the connections between them,to create greater accuracy. The number of elements range in size from 50,000 to 300,000. Theduration of the simulations is between 100 and 150 milliseconds (msec) for most frontal impactconditions.

The computer model is run using a non-linear dynamic analysis (commercial) code, LS-DYNA.Computational turn around time can be great - from 1 to 5 days on a standard work station.

Variables and sources of uncertainty in the vehicle manufacturing process and proving groundtest procedures induce uncertainties in the test results. The acceleration and velocity historiesof two production vehicles of the same type, subjected to 30mph zero degree rigid barrier frontalimpact tests, as shown in Figure 15 demonstrate the differences in “replicate” crashes. There area variety of materials used in components of the vehicle and, consequently, a variety of materialproperties to deal with, not all of which my be well specified.

Figure 15: Acceleration and velocity pulses in the occupant compartment from 30mph zero degreerigid barrier frontal impact tests for two production vehicles of the same type.

An Input/Uncertainty map for the crash model is given in Table 7.

C Technical details for Section 4

C.1 The GASP response-surface methodology

Recall that yM = (yM (x1,u1), . . . , yM (xm,um)) denotes the vector of m evaluations of the modelat the inputs DM = {(xi,ui) : i = 1, . . . ,m} and we write z = (x,u). The computer model isexercised only at the inputs DM , so that yM(z) = yM(x,u) is effectively unknown for other inputsz = (x,u). Thus, in the Bayesian framework, we assign yM(z) a prior distribution, specifically astationary Gaussian process with mean and covariance functions governed by parameters θL and

50

Page 51: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

INPUT IMP UNCERTAINTY CURRENTACT STATUS

Geometry Element size about 10mm 5 unspecified fixedHoles < 10mm, fillets, and 4 unspecified fixed

rounds not meshedUse of design surfaces, not 4 unspecified fixed

surfaces after stampingSpot weld locations are 5 unspecified fixed

approximatedThickness variation from 3 Can be specified with c.v. of Controllable in

location to location 2% for whole components some degreeMaterial Dynamic stress/strain curves 3 May be approximated with c.v. Controllable in

Properties of 5% for major components some degreeSpot weld failure force 3 unspecified fixed

Joints separation 2 Approximated with 5% c.v. ControllableDamping factor 5 Controllable fixed

Friction coefficients 4 unspecified fixedbetween part

Material density 5 unspecified fixedBoundary/ Vehicle mass/speed 5 Can be matched with the test fixed

Initial Barrier variation (plywood 3 unspecified fixedConditions condition and barrier angle)

Vehicle test attitude 5 unspecified fixedTesting environment (humidity 5 unspecified fixed

& temperature)Restraint Steering column stroking force 2 5% of c.v. controllableSystem Airbag deployment time 4 5% of c.v. controllable

Seatbelt retractor force 2 5% of c.v. controllableAirbag mass flow rate 2 5% of c.v. controllable

Occupant position 3 ± 12 inch on horizontal plan controllableand ± 1

4 inch on vertical

Table 7: The input uncertainty map for a math vehicle.

θM = (λM ,αM ,βM ), respectively. The mean function of the Gaussian process was discussed inSection 4, and so we turn to discussion of the covariance function.

The parameter λM is the precision (that is, the inverse of the variance) of the Gaussian processand the other parameters (αM ,βM ) control the correlation function of the Gaussian process, whichwe assume to be of the form

cM (z,z∗) = exp

d∑j=1

βMj |zj − zj

∗|αMj

. (C-1)

Here, d is the number of coordinates in z, the αMj are numbers between 0 and 2, and the βM

j

are positive scale parameters. The product form of the correlation function (each factor is itselfa correlation function in one-dimension) helps the computations made later. Prior beliefs aboutthe smoothness properties of the function will affect the choice of αM . The choice αM

j = 2 for

51

Page 52: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

all j reflects the belief that the function is infinitely differentiable, which is plausible for manyengineering and scientific models.

This can be summarized by saying that, given the hyper-parameters θL and θM = (λM ,αM ,βM ),the prior distribution of yM is GP (Ψ(·)θL, 1

λM cM (·, ·) ), i.e., a Gaussian process with the givenmean and covariance function.

As before, let yM denote the vector of model evaluations, at the set of inputs DM . Then,before observing the yM ’s, and conditionally on the hyperpartameters, yM has a multivariatenormal distribution with covariance matrix ΓM = CM (DM ,DM )/λM , where CM (DM ,DM ) is thematrix with (i, j) entry cM (zi,zj), for zi,zj in DM . Once yM is observed, it is a likelihood functionfor the parameters θL and θM (based solely on the observed yM ).

If z is a new input value, then the conditional distribution of yM (z), given yM , θL and θM

is normal. Formally, the posterior density, p(yM (·)|yM ,θL,θM ), is a Gaussian process with meanand covariance function given by

E[ yM (z)|yM ,θL,θM ] = Ψ(z)θL + rz′(ΓM )−1(yM −XθL) (C-2)

Cov[ yM (z), yM (z∗)|yM ,θL,θM ] =1

λMcM (z,z∗) − rz

′(ΓM )−1rz∗ , (C-3)

where rz′ = 1

λM (cM (z,z1), . . . , cM (z,zm)), ΓM is given above, 1 = (1, . . . , 1) and X is the matrixwith rows Ψ(z1), . . . ,Ψ(zm).

With specifications for θL and θM , the mean function, (C-2), can be used as an inexpensiveemulator for yM(·). Indeed, the response surface approximation to yM(x,u), given θL and θM issimply E[ yM (x, u)|yM ,θL,θM ] and the variance of this approximation is cM ((x,u), (x,u))/λM −r′(x,u)(Γ

M )−1r(x,u). Note that this variance is zero at the design points at which the function wasactually evaluated.

However, the hyper-parameters θM ,θL are rarely, if ever, known. Two possibilities then arise:

a) Plug-in some estimates in the above formulae, for instance maximum likelihood estimates (asin the GASP software of W. Welch – see also Bayarri et al. (2002)), pretending they are the‘true’ values. For MLE estimates θ

M, θ

Lthis produces the following model approximation

for inputs (x,u)

yMLE(x,u) = Ψ(x,u) θL

+ r′(x,u)(ΓM

)−1(yM −XθL) ,

where θM

= (λM , αM , βM

) is used to compute ΓM

and r(x,u). Similarly, θM

and θL

are

often ‘plug-in’ in Cov[ yM (x,u), yM (x,u)|yM , θL, θ

M] (see equation (C-3)) when computing

an estimate of ‘error’. Notice that this can result in an underestimation of the true variability,since the uncertainty in the estimates θ

Mand θ

Lis not taken into account.

b) Integrating the parameters with respect to the posterior distribution in a full Bayesian analy-sis (as described in Section 5 and in Bayarri et al., 2002) leads to a more appropriate emulator(model approximation), namely the integral of E[ yM (x, u)|yM ,θL,θM ] with respect to the

52

Page 53: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

posterior distribution of θM ,θL. Likewise, the variance of the emulator is obtained by aver-aging cM ((x,u), (x,u))/λM − r′(x,u)(Γ

M )−1r(x,u) over the posterior distribution of θM ,θL.This, in practice (see Bayarri et al. (2002)) amounts to generating N (large) values (θL

i ,θMi )

from its posterior distribution, and then simply evaluate the previous quantities at these gen-erated values and take the average. Hence the proposed Bayesian model approximation toyM (·) is

yM (x,u) =1N

N∑i=1

[Ψ(z)θL

i + ri′(x,u)(Γ

Mi )−1(yM −XθL

i )]

,

where ri′(x,u) and ΓM

i are computed using the generated values θMi for the parameter (i =

1, . . . , N). Likewise, the proposed variance function is

V M (x,u) =1N

N∑i=1

[1

λMi

cMi ((x,u), (x,u)) − ri

′(x,u)(Γ

Mi )−1ri

′(x,u)

],

where again the generated values of θMi are used to evaluate the functions cM

i .

Note that, while the proposed (Bayesian) model approximation yM (x,u) will often be similarto its MLE counterpart, yMLE(x,u), the proposed variance function, V M (x,u), will typically belarger than the corresponding plug-in variance function, because it appropriately takes into accountuncertainty in the parameters.

The procedure for obtaining a sample from the posterior distribution of yM , for computing theposterior mean and variance above can be summarized as follows.

1. Start with (i) the likelihood function of the model-run data, which from Bayarri et al. (2002)is proportional to a multivariate normal MV N(XθL,ΓM ) distribution; (ii) prior distributionsfor (θL,θM ), as given in Bayarri et al. (2002).

2. The posterior distribution of (θL,θM ) is then approximated by an MCMC analysis that is asimplified version of that described in Bayarri et al. (2002).

3. The posterior distribution of yM (znew) is then obtained by first sampling the posterior distri-bution of (θL,θM ), then sampling the multivariate normal with mean and covariance given by(C-2) and (C-3) with the sampled hyper-parameters. This is repeated many times to producethe required sample from the posterior distribution of yM .

This emulator can roughly be thought of as an interpolator of the data, unless there is numericalinstability in the computer model, as mentioned in footnote 2, in which case the emulator smoothesthe data.

53

Page 54: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

C.2 Processing stochastic inputs with GASP

One of the attractions of the particular form of the covariance function that we use for GASP isthat it can greatly simplify the handling of certain types of stochastic inputs. Writing

a = (a1, . . . , am)′ =1

λM(ΓM )−1(yM −XθL) ,

expressions (C-2) and (C-1) yield

E[ yM (z)|yM ,θL,θM ] = Ψ(z)θL +m∑

i=1

ai cM (z,zi)

= Ψ(z)θL +m∑

i=1

ai

d∏j=1

exp{−βMj |zj − zij |α

Mj }. (C-4)

Suppose now that inputs zb, . . . , zd are stochastic, with (for simplicity) independent densities pj(zj).Then taking the expectation of (C-4) over these random inputs yields

E[E[ yM (z)|yM ,θL,θM ]] = E[Ψ(z)]θL +m∑

i=1

ai

b−1∏j=1

e−βMj |zj−zij |α

Mj

d∏j=b

∫e−βM

j |zj−zij |αMj

pj(zj) dzj .

(C-5)Assuming the underlying basis functions Ψ(z) are chosen so that their expectation with respect tothe pj(zj) is easily computable (trivial, for instance, if the mean function is linear), (C-5) showsthat the expectation reduces to computation of a collection of one-dimensional integrals.

This can be an enormous computational simplification, especially when, say, optimization overthe nonrandom inputs z1, . . . , zb−1 is desired. The one-dimensional integrals in (C-5) can be carriedout in a pre-processing step, and the optimization then easily implemented.

Even greater simplifications are possible if the αMj equal 1 or 2 and the pj(zj) are normal

or exponential densities; the one-dimensional integrals can then be carried out in closed form.Furthermore, if the αM

j = 2 (a possible choice if, for instance, the computer model is expected tobe very smooth), then even a multivariate normal density for zb, . . . , zd will lead to closed fromintegrals.

D Technical details for Section 5

D.1 Prior distribution for the bias function

The prior density of the bias is taken to be another Gaussian process with correlation function asin (C-1), but with its own set of hyper-parameters. However, we restrict attention to smooth biasfunctions by fixing all components of the vector αb to be two. In part, this is done for technicalreasons; since the bias cannot be observed directly, there is very little information available aboutαb, and numerical computations are more stable with αb specified. There is also the notion that thebias process might typically be smoother than the model process; for instance, the model process

54

Page 55: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

might only be ‘off’ by a level-shift, because of something forgotten or inappropriately specifiedin the model. Indeed, there is both empirical and ‘folklore’ evidence of this. Empirically, in theexamples we have looked at, the maximum likelihood estimates of αb have mostly been near 2. Asto folklore, it is often claimed that even biased models are typically accurate for predicting smallchanges, which would not be true if bias were not smoother than the model outputs. Finally, notethat the bias can still assume the form of any infinitely differentiable function.

The mean function of the Gaussian process used to model the bias is typically chosen to beeither zero or an unknown constant. (In the test beds, we used a zero mean for SPOT WELD andallowed a non-zero mean for CRASH.) More complicated linear structures, such as Ψ(z)θL, arepossible, but we have not yet ascertained the extent to which they are helpful.

D.2 Analysis with model approximation

An outline of the Bayesian analysis when the model yM is inexpensive to run was given in Section5.3. Typically, however, models runs are very expensive. The actual analysis, in this case, is similarto that described in Section 5.3, except that now yM is also viewed as unknown, with the Gaussianprocess prior. Indeed, we recommend that one directly use p(yM |yM ,θM ) from Appendix C.1 asthe posterior distribution of yM . Then the only modification needed in the analysis in Section 5.3 isto draw yM (x,ui) directly from this posterior (i.e., draw (θL,θM ) from its posterior, based on themodel data, and then compute yM (x,ui) from the GASP posterior using these parameter values)whenever it is needed to compute the likelihood f(yF |u, λF , b).

If one is going to predict the process at some new input vector x, one can

Case 1. Compute yM (x, u), (preferable, if possible). It is then important to update the posteriordistribution p(yM |yM ,θM ) by including the data point yM (x, u) in the data yM , but keepingthe other aspects of the posterior distribution unchanged. This can be done by an updatingformula that is given in Bayarri et al. (2002).

Case 2. Use the prediction arising directly from the above posterior, avoiding computation ofyM (x, u).

The above analysis is really only an approximate Bayesian analysis, in two respects. First, therecommendation to include the data point yM(x, u) in the data yM , but keep the other aspectsof the posterior distribution unchanged, is not the full Bayesian analysis; but a full analysis wouldrequire rerunning the entire MCMC with this data point added, which will rarely be feasible. Thesecond approximate aspect of the analysis is that the formal posterior distribution of all unknownsis actually

p(yM ,u, λF , b,θ|yF ,yM ) ∝ f(yF |yM ,u, λF , b)p(yM |yM ,θM )f(yM |u,θM )p(u, λF , b)p(θM ) ,

where p(θM ) is the prior density of θM and we now recognize that yM is also unknown in thelikelihood arising from yF . The posterior distribution p(yM |yM ,θM ) is readily available from the

55

Page 56: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

GASP theory. The main reasons not to utilize this posterior is that it significantly increases thedifficulty of performing the needed updating when yM (x, u) is computed.

E Technical details for Section 7

E.1 Kronecker product

Since we assume that the functions are discretized at the same points in DFt for all x in the data,

the overall design spaces (the sets of (x, t) points at which model-run and field data are obtained)can be written as the products DF

x × DFt and DM

x × DMt . The product form of the correlation

functions for the model approximation and bias processes then induces a simple algebraic structure.Specifically, the correlation matrices induced by the correlation functions (the (i, j)th entries of thematrices are the c’s evaluated at the i and j data points) have a form that can be manipulated tosimplify computation. The basic idea lies in recognizing that the matrices have the so-termed formof a Kronecker product defined as: A ⊗B of matrices Am×n,Bp×q is the mp × nq matrix whosei, j block is aijB.

Indeed, if we denote the correlation matrices by C (there will be appropriate superscriptscorresponding to the particular correlation functions generating the matrices), then each elementof C is a product of an “x” term and a “t” term . Since the computer model design is of the formDM × DT , we can write CM,T as a Kronecker product:

CM,T = CM ⊗CT , (E-1)

where CM and CT are correlation matrices corresponding to the x and t components of thecorrelation functions. The same can be done with Cb,T and CF,T . We could have different CT forM, b and F in (E-1) but we take these CT to be the same in each case for simplicity and to makecomputations feasible. The assumption that CT is the same for M, b and F is not unreasonable -any choice of CT results in the posterior means of yM and yF to be interpolators as functions of t

for fixed x, and we can always select enough t points to ensure enough accuracy in the predictionsalong t.

The advantage of the Kronecker product structure lies in resulting simplifications for calculatinginverses of the correlation matrices. This is crucial because these inverses must be calculated manytimes in the MCMC process that produces the posterior distributions of the model parameters.Specifically, the inverse of, say, CM is the Kronecker product of the component inverses (Searle,1982):

(CM,T )−1 = (CM )−1 ⊗ (CT )−1. (E-2)

Because the component matrices are m×m and T ×T while CM,T is m T ×m T the computationalsavings in computing the inverse are obvious.

CRASH: Using data only for the straight frontal barrier there are 9 different impact velocities

and if we use 19 time points we get a total of 171 data points. The correlation matrix, CM ,

56

Page 57: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

corresponding to the computer runs, is then a 171 × 171 matrix but is a Kronecker product of

9 × 9 and 19 × 19 matrices.

E.2 Analysis of function output

We have two sets of data: computer model data and field data. The measurement error term inthe model for the field data must incorporate “time” dependence of the field observations. We givehere only an sketch of the approach followed. Full details can be found in Bayarri et al. (2002). LetyF denote the sufficient statistic for field data Given the hyper-parameters of the Gaussian processpriors the distribution of the complete data vector y = (yM , yF ) is

p(y | θL,βM , λM ,βb, λb, λF , βεt , α

εt,α

M ,αb) = N

ψ(z1)

...ψ(zn)

θL,Σ ⊗CT

, (E-3)

where zi is the (x, t) input associated with yi, and Σ is a covariance matrix whose specific form isgiven in Bayarri et al. (2002).

The MCMC analysis (see Bayarri et al., 2002 for full details) can be carried out for functionoutput. The necessary inversions of Σ ⊗ CT are simplified because of the Kronecker productstructure and (E-2). The posterior distribution of yM (x, t), for each selected point in DP =DP

x ×DPt (DP

x contains the x points at which we want function realizations; DPt is dense enough to

get a good image of the function yM(x, ·) and is not the same as DT ), can be obtained (simulated).This produces a prediction yM(x, t) of yM (x, t) and accounts for uncertainties in the unknownparameters. By doing so for each t we get a prediction of yM (x, ·) with pointwise uncertainties.

Inference for a specific evaluation criterion proceeds as follows. For each realization of theparameters the output function is simulated (at least at a reasonably dense set of t— see Bayarriet al., 2002). Then the evaluation criterion is calculated. Repeating this yields a sample from itsposterior distribution. This procedure can be applied to simulations from the posterior distributionof the model yM and from the posterior distribution of reality yR.

F Technical details for Section 8

Suppose we have K related models. We assume that the models are related as follows (see Bayarriet al., 2000, for discussion):

1. All models and field data have common α’s, common β’s, common λM ’s and common λF .The choice of priors for these is explained in Bayarri et al. (2002).

2. θLi ∼ Nk(µ,Diag(τ−1

1 , . . . , τ−1k )), where p(µ, τ1, . . . , τk |λM ) =

∏ki=1[τ

−1i (τi + ν

λM )−1], with ν

equalling the average number of model runs.

3. log(λbi) ∼ N(ξ, 4q2), where q is the proportional variation in the bias that is expected among

models (e.g., q = 0.1). A constant prior density is assigned to ξ.

57

Page 58: A Framework for Validation of Computer Models · 2010. 4. 19. · J.A. Cafeo, J. Cavendish, C.H. Lin, J. Tu General Motors October 27, 2002 Abstract In this paper, we present a framework

Figure 16: Posterior distributions for 4 barrier types, based on the hierarchical model.

The crucial input needed here is q in Assumption 3. The condition in Assumption 3 impliesthat the variance of log(λb

j/λbi ) is 8q2 so that the standard deviation is

√8q. Therefore log(λb

j/λbi )

is likely to be less than 2q or equivalently 2 log√

λbj/λ

bi < 2q and then

√λb

j/λbi < eq ∼ 1 + q. So

Assumption 3 is roughly equivalent to saying that the ratio of the standard deviation (SD) of thebiases is less than 1+q or, stated another way, the proportional variation in the bias is q. Specifyingq = 0.1 is stating that the biases are expected to vary by about 10% among the various cases beingconsidered. Then, also roughly, SD(log

√λb

j/λbi ) ∼ log SD(

√λb

j/λbi ) ∼ q or SD(log(λb

j/λbi )) ∼ 2q,

which is a consequence of Assumption 3.Note that, because of these assumptions, our earlier notation does not need to be changed

to deal with the hierarchical situation (i.e., we simply add the index i corresponding to differentmodels to the parameters θL, λM and λb that we allow to vary between models).

CRASH: The hierarchical model is as given with q = 0.1, and for the prior on θL, we takeν = 6.5. The prior distributions are the same as used for the straight frontal analysis, whichare described in Bayarri et al.(2002); this is reasonable, since the priors are relatively non-informative and the straight frontal dataset is the largest of the 4 categories.

Figure 16 shows the posterior distributions of log λb and θL for individual barrier types.Note that, while the assumed similarity between the models allows information to be passedfrom ‘large data’ to ‘small data’ models, the models are still allowed to vary significantly.

58