Data mining issues on Data mining issues on improving the accuracy of the improving the accuracy of the rainfall-runoff model for flood rainfall-runoff model for flood
forecastingforecasting
Jia LiuJia Liu
Supervisor: Dr.Supervisor: Dr. Dawei HanDawei Han
Email: [email protected]: [email protected]
WEMRC, Department of Civil EngineeringWEMRC, Department of Civil Engineering
University of BristolUniversity of Bristol
24 May 201024 May 2010
OutlinesOutlines
Introduction to the Probability Distributed Model (PDM)Introduction to the Probability Distributed Model (PDM)
Two data mining issues:Two data mining issues:
Selection of data for model calibrationSelection of data for model calibration
Optimal data time interval in flood forecastingOptimal data time interval in flood forecasting
Conclusions and Future workConclusions and Future work
Introduction to rainfall-runoff modelIntroduction to rainfall-runoff modelHydrological CycleHydrological Cycle
Rainfall-Runoff ModelRainfall-Runoff Model
RunoffRunoff
Rainfall (and Evaporation)
Rainfall (and Evaporation)
A conceptual representation of the hydrological cycleA conceptual representation of the hydrological cycle
The fundamental work for any water researches, i.e., The fundamental work for any water researches, i.e.,
real-time flood forecasting, land-use change evaluationsreal-time flood forecasting, land-use change evaluations
and design of hydraulic structures, etc.and design of hydraulic structures, etc.
Rainfall-runoff modelRainfall-runoff model
Introduction to rainfall-runoff modelIntroduction to rainfall-runoff modelHydrological CycleHydrological Cycle
A conceptual representation of the hydrological cycleA conceptual representation of the hydrological cycle
The fundamental work for any water researches, i.e., The fundamental work for any water researches, i.e.,
real-time flood forecasting, land-use change real-time flood forecasting, land-use change
evaluations and design of hydraulic structures, etc.evaluations and design of hydraulic structures, etc.
Rainfall-runoff modelRainfall-runoff model
Probability Distributed ModelProbability Distributed Modelby Moore (1985) by Moore (1985)
13 Model Parameters 13 Model Parameters to be calibratedto be calibrated
ffcc, , TTdd, c, cminmin, c, cmaxmax, b, b, b, bee, k, kgg, ,
bbgg, S, Stt, k, k11, k, k22, k, kbb, q, qcc
How to cope with the ‘data rich’ environment?How to cope with the ‘data rich’ environment?
Questions proposed:Questions proposed: A. How to select the most appropriate data to calibrate the model?A. How to select the most appropriate data to calibrate the model?
2. Which period the data should be selected from?2. Which period the data should be selected from?
1. How long the data should be?1. How long the data should be? Data LengthData Length
Data DurationData Duration
B. When used for forecasting, what is the most appropriate sampling rate?B. When used for forecasting, what is the most appropriate sampling rate?
Data Time IntervalData Time Interval
Large quantityLarge quantityDataData Fast sampling rateFast sampling rate++
Calibration data selection: data length and durationCalibration data selection: data length and duration
Data used for model validation is often determined. Data used for model validation is often determined.
We assume that the more similarity the calibration data bears to the validation data, We assume that the more similarity the calibration data bears to the validation data,
the better performance the rainfall-runoff model should have after calibration. the better performance the rainfall-runoff model should have after calibration.
0
5
10
15
20
25
30
m3/
s
0
20
40
60
80
100
mm
Validation data set
A good information qualityA good information quality of the calibration data set = of the calibration data set =
A similar information content to validation data setA similar information content to validation data set
Calibration data set
Comparison of the information Comparison of the information quality of the two data setsquality of the two data sets
Calibration data selection: data length and durationCalibration data selection: data length and duration
2jj kkE C
2jj kkE S
jj
jj E
EP
An indexAn index which can reveal the similarity between the calibration and validation data which can reveal the similarity between the calibration and validation data
sets, can be used as a guide for calibration data selection for the rainfall-runoff model.sets, can be used as a guide for calibration data selection for the rainfall-runoff model.
Information Cost Function (ICF)Information Cost Function (ICF)
ICF lnj jj
P P The Information Cost Function (ICF) is a an entropy-like function that gives a good estimate of the degree of disorder of a system
Energy of detail
Energy of approximation
Percentile energy on each decomposition level
Fast Fourier TransformFast Fourier Transform
Discrete Wavelet DecompositionDiscrete Wavelet Decomposition
Flow Duration CurveFlow Duration Curve
Liu, J., and D. Han (2010), Indices for calibration data selection of the rainfall-runoff model, Water Resour. Res., 46, W04512, doi:10.1029/2009WR008668.
X
Z
YX1
XN
YN
Y1
Z1
ZN
Forecast lead time Data time interval
Model error
X 1
Z 1
Error
Time interval
Z N
X N
Error
Time interval
Long lead time
Short lead time
Optimal data time interval – for the forecast modeOptimal data time interval – for the forecast modeBf s 2
Optimal time intervalOptimal time intervalSampling theorySampling theory
Bf s 2Lower boundary: Lower boundary:
Too slowToo slow Too fastToo fast
Leading to numerical problemsLeading to numerical problems
[[Åström, 1968Åström, 1968;; Ljung, 1989]Ljung, 1989]
Sampling rate of model input dataSampling rate of model input data
Hypothetical curveHypothetical curve
A positive relationA positive relation
Data time interval
Forecast lead time
Optimal data time interval – for the forecast modeOptimal data time interval – for the forecast modeBf s 2
Case studyCase study
Auto-Regressive Moving Average Auto-Regressive Moving Average
(ARMA) model for on-line updating(ARMA) model for on-line updating
Four catchments are selected from Four catchments are selected from
the Southwest England:the Southwest England:
CatchmentsCatchmentsAREA AREA (km(km22))
LDP LDP (km)(km)
DPSBAR DPSBAR (m/km)(m/km)
A A BelleverBellever 21.521.5 13.513.5 94.994.9
B B HalsewaterHalsewater 87.887.8 19.419.4 85.785.7
C C Brue Brue 135.2135.2 22.622.6 71.171.1
D D Bishop_HullBishop_Hull 202.0202.0 40.240.2 98.098.0
LDP: longest drainage path (km)
DPSBAR: mean drainage path slope (m/km)
51°05′N
51°00′N
3°10′W 3°05′W3°15′W4°00′W 3°55′W
50°35′N
50°40′N
2°35′W 2°30′W 2°25′W
51°10′N
51°05′N
3°20′W 3°15′W 3°10′W
51°05′N
51°00′N
Bellever Halsewater
Brue Bishop_Hull
Optimal data time interval – for the forecast modeOptimal data time interval – for the forecast modeBf s 2
Case studyCase study
The positive pattern between the The positive pattern between the
optimal data time interval and the optimal data time interval and the
forecast lead time is found to be forecast lead time is found to be
highly related to the highly related to the catchment catchment
concentration timeconcentration time..
CatchmentsCatchmentsAREA AREA (km(km22))
LDP LDP (km)(km)
DPSBAR DPSBAR (m/km)(m/km)
A A BelleverBellever 21.521.5 13.513.5 94.994.9
B B HalsewaterHalsewater 87.887.8 19.419.4 85.785.7
C C Brue Brue 135.2135.2 22.622.6 71.171.1
D D Bishop_HullBishop_Hull 202.0202.0 40.240.2 98.098.0
LDP: longest drainage path (km)
DPSBAR: mean drainage path slope (m/km)
Bellever Halsewater
Brue Bishop_Hull
015
30
60
120
0123456
9
120
0.2
0.4
0.6
0.8
1
XY
Z
015
30
60
120
0123456
9
120
0.2
0.4
0.6
0.8
1
XY
Z
015
30
60
120
0123456
9
120
0.2
0.4
0.6
0.8
1
XY
Z
015
30
60
120
0123456
9
120
0.2
0.4
0.6
0.8
1
XY
Z
Conclusions and Future workConclusions and Future work
Selecting data with the most appropriate Selecting data with the most appropriate length, duration and time intervallength, duration and time interval is of great is of great
significance in improving the model performance and helps to enhance the efficiency significance in improving the model performance and helps to enhance the efficiency
of data utilization in rainfall-runoff modelling and forecasting.of data utilization in rainfall-runoff modelling and forecasting.
More research is needed to explore the applicability of the ICF index for calibration data More research is needed to explore the applicability of the ICF index for calibration data
selection and to verify the hypothetical curve of the optimal data time interval.selection and to verify the hypothetical curve of the optimal data time interval.
Weather Research & Forecasting (WRF) ModelWeather Research & Forecasting (WRF) Model
Rainfall-Runoff ModelRainfall-Runoff Model
RunoffRunoff
Rainfall (and Evaporation)
Rainfall (and Evaporation)
As real-time inputsAs real-time inputs
Updated by observationsUpdated by observations
The EndThe End
Thank you for your attention!Thank you for your attention!