RULES FOR RESPONSIBLE
MODEL BUILDINGWilliam James
University Professor Emeritus
President, CHI
Guelph, Canada [email protected]
"All models are wrong, though some may be said to be useful." (G.E.Box).
It's not enough to know simply when or how a model may be said to be useful - it's more important to know how reliable it is.
R:1
A model is a concept. Concepts are used in thinking, scientific deduction, engn’rg design and forensics. They are improved by experience. We do not necessarily require the model that most approaches perfection, rather we seek the model that provides an acceptably accurate explanation. Simple models are often said to be “better” than complex models.
Optimal model complexity depends on the questions to be resolved and the resources available.
•accept the limits of the discipline of engineering; •improve and restore the natural balances and bio-diversity; •correct the human behaviour that caused the problem to the ecosystem; •imitate the structure of the natural, native or indigenous system; •be good for all parts of the natural system; •not enrich one individual or group to the distress or impoverishment of another; •be in harmony with good character, cultural value, and moral law.
R:70
Your model should meet your own ethical standards – it should:
•the living world is the matrix for all design, •design should follow the laws of life, •biological equity must determine design, •design must reflect bioregionality, •projects should use renewable energy systems, •design should integrate living systems, •projects should heal the planet, and •design should follow a sacred ecology
R:69
R:12
variance can be systematically reduced by including (explaining) more and more relevant processes, at a higher time and spatial resolution.
- fundamental tenet
The implicit problem in critical thinking is to find the most probable flaws in an argument, to discern the best lines of thought and to improve the argument. The solution may be stated: if we test the argument perhaps over a long time, which parts of the argument are less likely to be a valid, and how may the experience be better explained elsewise?
The implicit problem in scientific method is to find the optimum or sufficient description of the dominant processes. The solution may be stated: if we test the current explanation of dominant processes over a long period of time, e.g. 75-years for an engineering environmental problem, is the description optimal in the sense that it is the most parsimonious description that meets the required, or imposed, uncertainty?
The implicit problem in engineering design is to find the optimum cost-effective array of best practices. A solution may be stated: if the 75-year rainfall time series that occurred at the International Airport, had in fact occurred at Foxran Estates, then plan 126 would have been the most cost-effective of the 329 plans examined - had they, of course, all existed over this time.
The implicit problem in engineering forensics is to find the most credible explanation for an acute problem, and to suggest a cost-effective solution which is generally to replace the acute problem by a chronic problem. The price to be paid is vigilance.
Concerns include: What array of models should be used? What is the model applicability in the
context of the study objectives? What accuracy is achievable? What is the uncertainty of the model? What investment of model effort is most
cost-efficient? Is cost-efficiency appropriate for
optimizing an uncertain model?
Rule: A model is used to help select the best among competing proposals. It is fundamentally irresponsible and unethical for modelers not to interpret the inherent uncertainty
R:2
Steps in model construction
1. review and re-state the problem 2. construct the as-is model input data set3. select model performance evaluation criteria 4. select an objective function 5. calibrate and evaluate the model 6. satisfied? If no, go back to 1; If y, go to 7 7. model several theoretical or to-be situations 8. select the likely best alternative 9. report the best solution and its uncertainty.
R5
Rule: Computed and observed time series are more ethically represented as smudges than single-valued lines.
R:7
Rule: Objectives must be simplified and related to the computed output and objective functions. The model must include code that adequately describes all significant processes.
R:8
where:Nm = the number of modules active in the model,
Ns = number of sub-spaces modeled in each module,
Npr= number of processes modeled in each sub-space,
pa = the input parameters required for each process.
M S pr
r
msp
N
m
N
s
N
papC
1 1 1,,
R:20
cost is taken to be a combination of:
1. engn’rg fees to design alternative solutions; 2. construction costs of the selected alternative; 3. intangible costs; 4. costs due to uncertainty of the selected option.
R:48
Design costs term
Model error term
Evaluation function
+
Complexity C
Optimum complexity
min
R62
106102100 108105104 107103101 109
$
106
100
103
101
102
104
105
Note: Bill’s suggested
relations & numbers
Rule: In determining the best level of complexity, test simple models first, proceeding to more complex, until the required accuracy of the computed response function is achieved. Use the least number of processes, discretized spaces, and the biggest time step that delivers the required uncertainty.
R:51
Sensitivity analysis consists of 1. varying model coefficients one at a
time, with the amount varied being representative of the uncertainty in the parameter being analyzed,
2. dividing resulting dimensionless change in computed response by the dimensionless parameter variation, and then
3. ranking the resulting sensitivity gradient.
R:129
0.750
0.775
0.800
0.825
0.850
0.875
-7.5 -5.0 -2.5 0.0 2.5 5.0 7.5
Non-linear sensitivity gradients for peak FlowMedium duration, medium intensity (0.3 in/hr for 1 hr) / Location 100
Flo
w [c
fs]
Percent change in parameter
WW1 WAREA WW3 WSLOPE WW5 WW6
WW7 WW8 WW9 WW10 ww11
Wkbk:59
Rule: Do not test a generalized program per se for sensitivity, parameter optimization, or error, because individual applications are likely to be radically different. Values of parameters in the input datafile determine which processes will be dominant or dormant. Relative parameter values change both the model sensitivity and the model uncertainty. Each model application must be separately tested over the relevant range of model
R:3
categorize input parameters in four groups:
1. can be measured with almost total certainty:
2. can be readily measured in the field or laboratory.
3. cannot be easily measured in the field or laboratory.
4. cannot be measured with any certainty at all.
model process calibrationmodel process calibration
parameter estimation
continuous model
event calibration
sensitivity analysis
© W James ‘97
1. CalibrationStart
parameters
CalibrationIFs
User inputPost-
processor
Programs
Datafile
Model
RFs
OFs
EFs
OK?Parameter
OptimizationSensitivityAnalysis
ErrorAnalysis
ModelLongterm
IFs
Inference
ContinuousFuzzy RFs
End
2. Inferences
No
Yes
R:105
Steps in model construction
1. review and re-state the problem 2. construct the as-is model input data set3. select model performance evaluation criteria 4. select an objective function 5. calibrate and evaluate the model 6. satisfied? If no, go back to 1; If y, go to 7 7. model several theoretical or to-be situations 8. select the likely best alternative 9. report the best solution and its uncertainty.
R5
• Nodes– Depth, head, volume, lateral inflow, total inflow,
flooding• Links
– Flow, depth, velocity, capacity• System
– Temp, rainfall, snow depth, losses, runoff, dry weather inflow, ground water inflow, RDII inflow, direct inflow, total inflow, flooding, outflow, storage
24 ResponseFunctions
PCSWMM 2005Utilities
PCSWMMTerminology
typical cycle in a response or input function
-the functions may be observed, synthetic or computed; RFcrit and IFcrit are arbitrary
typical cycle in a response or input function
-the functions may be observed, synthetic or computed; RFcrit and IFcrit are arbitrary
t1,1 t1,2 t1,3 t1,4 t2,1 t2,2
RFcrit, IFcrit
RF(t), IF(t)
R:117
OF1: (t2,1 - t1,1) duration of wet event
OF2: (t2,2 - t1,3) duration of dry event
OF3: RF(t1,3) peak flow, flux, or concentration
OF4: RF(t1,1) minimum flow, flux or concentration
OF5: *INT (t1,4-t1,1) total wet event flow or flux
OF6: (t1,4 - t1,2) duration of exceedance
OF7: (t2,2 - t1,4) duration of deficit
OF8: n[RF>RFcrit] number of exceedances
OF9: n[RF<RFcrit] number of deficits
OF10: *INT (t2,2-t1,4) volume of deficit
OF11: *INT (t1,4-t1,2) volume of excess
OF12: OF5/OF1 wet event mean concentration
OF13: *INT (t2,1-t1,4) total dry event flow or flux
OF14: OF13/OF2 dry event mean concentrationR:117
dtRF(t) = OFt
t
5
1,4
1,1
dtRF(t)-RF = OF crit
t
t
10
2,2
1,4
dtRF-RF(t) = OF crit
t
t
11
1,4
1,2
RF(t)dt = OFt
t
13
2,1
1,4
R:118
Dominant process Objective function
Overland flow over impervious areas OF3
Infiltration into the upper soil mantle OF4
Pollutant washoff OF5
Erosion OF1
Overland flow over pervious areas OF3
Pollutant build-up OF5
Recovery of storages OF2
Recovery of loss (infiltration) rates OF4
Recession of storages OF7
Evaporation *IF8
Snowmelt *IF11
snow accumulation *IF7 R:119
Rule: Select the best objective function thoughtfully, by relating it back to the original design questions. Use the minimum acceptable number of objective functions.
R:119
1. observation error, related to field instrumentation, comprising two components, one random one systematic; 2. sampling error, associated with the timing and location of the field equipment; 3. numerical error, identified with numerical math used in the code; 4. structural error, related to disaggregation (the number & resolution of the processes active); 5. structural error, related to discretization (the spatial resolution);6. structural error, related to poor formulation of one or more of the component process relations and code; and7. propagated error, related to erroneous parameters.
R:123
Internal descriptionExternal description
1. aggregation error
2. numerical error
3. structural error
1. uncertainty due to naturalvariability, or unobservedinput disturbances.
2. measurement and samplingerrors of observed input andoutput.
Prior knowledge
Calibration process
Identify as-is model
3. start-up error
4. input TS datafile error
5. model error
4. discretization error
5. input environmentdatafile error
6. model structure andstate-parameter error
7. parameter optimiza-tion error
Design process 6. uncertainty of to-beparameters
(inference to the to-be and as-was scenarios
7. user output-interpretationerror
8. parameter propaga-tion error
9. error analysis
R:124
Rule: Sixteen sources of error are listed in the framework for uncertainty analysis presented here. When interpreting the computed output from your model, all sixteen sources should be explicitly interpreted.
R:127
model users must be able to:
1. isolate the important empirical parameters that require refining (calibration),
2. associate these parameters with their correct processes (may be more than one),
3. isolate the conditions under which the processes are active (again may be more than one), and then
4. select state-variable events (SV sub-spaces) for sensitivity (which may be hypothetical events), and
5. select state-variable events from the observed record for calibration analyses. R:136
A
B
C
D
D
(Ofi)c
(Ofi)o
A represents “small” eventsB represents “medium” eventsC represents “big” eventsD represents fuzzy overlaps
R:137
Short-duration-high-intensity SDHI 20 m; 3 in/h
Medium-duration-hi-intensity MDHI 60 m; 1.0 in/hr
long-duration-high-intensity LDHI 600 m; 0.2 in/h
Short-duration-med-intensity SDMI 20 m; 0.4 in/hr
Medium-duratn-med-intensity MDMI 60 m; 0.3 in/h
long-duration-med-intensity LDMI 600 m; 0.1 in/h
Short-duration-low-intensity SDLI 20 m; 0.1 in/h
Medium-duration-low-intensity MDLI 60 m; 0.1 in/hr
long-duration-low-intensity LDLI 600 m; 0.1 in/h
Evapo-transpiration:
Short-duration-high-intensity SDHI 1 d; 0.5 in/d
long-duration-high-intensity LDHI 10 d; 0.3 in/d
Short-duration-low-intensity SDHI 1 d; 0.05in/d
long-duration-low-intensity LDLI 10 d; 0.05 in/dR:139
Rain:
Light rate of rain Overland flow over impervious areas
Medium rate of rain Infiltration into upper soil mantle; pollutant washoff
Heavy rate of rain Erosion; pollutant washoff; pervious area flow
Long duration rain Overland flow over pervious areas
No rain:
Long duration drought Pollutant build-up; groundwater depletion
Short duration drought Storage recessions
Temperature:
High temperatures Evapo-transpiration; snowmelt
Low temperatures Snow accumulation & ripening
Wind:
High wind Snowmelt R:140
Rule: Associate parameters with processes, and processes with causative events, and causative events with limited state-variable sub-spaces.
R:140
A total error statistic (EFt) may be used to quantify overall
goodness of fit:
|)CPF - OPF| (w + n
) OOF - COF( w)- (1.0 = EF pp
2ii
n
=1i
2
1
t
where:
EFt = total error statistic (m3/s);
w = weighting factor;
n = number of measured hourly flows;
OOF = measured flow (m3/s);
COF = computed flow (m3/s);
OPF = measured peak flow (m3/s); and
CPF = computed peak flow (m3/s).
R:142
Rule: Use first-order error analysis to report the estimated propagated error in your recommended design solution.
R:156
not used
FD
G C
B H
AE
not used
rate of rain
+ ve
zero
- ve
evapo-transpirationrate
zero duration of rain
R:165
medium zero short med long
IB,H
A,E
D
C,F,G
0.0
1.0
evapo-transpiration duration rate-of-rain
R:166
general form:If X period is
Y , analyze
Z parameters.where X, Y, Z have the following meanings: X
Y
Z1. rain
long
erosion
2. rain
medium
pervious area flow3. rain
medium
pollutant washoff4. rain
short
impervious area flow5. rain
short
rain-out 6. ET
exists
recovery of storages7. ET
exists
recovery of loss rates
8. ET
exists
groundwater depletion9. ET
medium
pollutant build-up
R:167
Rule: Analyse only sensitive parameters, and then only against relevant events.
R:167
Framework for continuous modeling:
At your desk:1. Make a list of simplified design questions, and postulate the relationship between your list and your proposed objective functions.
R:169
2. Select the best objective functions and response functions for your study problem. Minimize the computed output and computer execution times. Allocate storage space for computed time series management.
R:169
3. Obtain or generate a credible, very-long-term time series to drive your model for design inference.
R:169
4. Obtain a short but sufficient record of good, observed events to calibrate your model.
R:169
Using the PCSWMM4 shell:5. List all parameters that need to be
optimized, and their associated processes.
R:169
6 Associate all processes with the limited state-variable sub-spaces where they dominate.
R:169
7. Search the good observed record for a sufficient number of appropriate events.
R:169
8. Estimate: 1. the mean most likely value, 2. a higher most likely value, and 3. a lower most likely value for each of
all input parameters. Choose the sensitivity test range, but keep it small.
R:169
9. Carry out the sensitivity tests, and rank all parameters, in terms of their dimensionless sensitivity gradients.
R:169
10. Optimize the parameters to give the smallest error.
R:169
11. Run the calibrated model for the long term time series for each array of BMPs.
R:169
12. Infer which is the best array. Rerun the model for this array estimating the error in the computed response functions.
R:169
13. Study all the input and output information again; make certain that it is logical, and gain knowledge about the performance of the drainage system. Interpret the impact of the errors.
R:169
At your client's office:
14. Report your recommendations, and, provided you follow the logic, become rich and famous.
R:169
The following 8 rules form a personal catechism for honest, very-long term, continuous surface water quality modeling
R:171
Rule 1: Do not calibrate all parameters simultaneously against a long-term continuous observed record, notwithstanding any early advice to the contrary in the literature.
R:171
Rule 2: Transpose or synthesize a long-term, hydro-meteorologic input time-series from the same hydrologic region, and use this for inferring comparative performance of various arrays of BMPs. Many records of 50 years duration or longer are available.
R:171
Rule 3: Carefully choose the best objective functions that represent the design questions and the model variability. Get the advisory committee to justify the selections in writing.
R:171
Rule 4: In order to control the amount of computing, associate the input parameters with processes, and processes with causative events, and causative events with limited state-variable sub-spaces. For this activity, sensitivity analysis code in PCSWMM4 is helpful. Do not analyze parameters outside these spaces.
R:171
Rule 5: Use three estimates of the most likely parameter values. It is more meaningful to compare the computed response from several reasonable models, rather than responses computed using extreme values.
R:171
Rule 6. Assume that the WQM is approximately linear, for the purposes of optimizing parameters, and estimating the propagated error. Then analyze for sensitivity near the mean expected values of all input parameters.
R:171
Rule 7: Calibrate only sensitive parameters, and then only against relevant events for which you have good, short-term observed data. And that must include good rate-of-rain with adequate coverage and spatial resolution.
R:171
Rule 8: Use first-order linear error analysis, and report the estimated propagated error in your recommended design solution.
R:171
The endThe end
•www.computationalhydraulics.com •www.eos.uoguelph.ca/webfiles/james
see you on-line at:see you on-line at: