ECMWF Training Course Reading, 25 April 2006
EPS Diagnostic Tools
Renate Hagedorn European Centre for Medium-Range Weather Forecasts
ECMWF Training Course Reading, 25 April 2006
Objective of diagnostic/verification tools
Assess quality of forecast systemi.e. determine skill and value of forecast
A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria.
A forecast has value if it helps the user to make better decisions than without knowledge of the forecast.
• Forecasts with poor skill can be valuable (e.g. location mismatch)
• Forecasts with high skill can be of little value (e.g. blue sky desert)
ECMWF Training Course Reading, 25 April 2006
Ensemble Prediction System
• 1 control run + 50 perturbed runs (TL399 L62)
added dimension of ensemble members
f(x,y,z,t,e)
• How do we deal with added dimension when
interpreting, verifying and diagnosing EPS output?
ECMWF Training Course Reading, 25 April 2006
Individual members (“stamp maps”)
ECMWF Training Course Reading, 25 April 2006
EPSgrams
median
min
25%
75%
max Cloud Cover
Precipitation
10m wind
2m Temperature
ECMWF Training Course Reading, 25 April 2006
Ensemble mean
Day+6 Ensemble mean Day+6 control
• It gives a smoother field than the deterministic forecasts, but the same result can’t be achieved with a simple filtering of a deterministic forecast
• The ensemble mean forecast is the average over all ensemble members
ECMWF Training Course Reading, 25 April 2006
Ensemble mean
Day+6 Ensemble mean Day+6 control (filtered)
• It gives a smoother field than the deterministic forecasts, but the same result can’t be achieved with a simple filtering of a deterministic forecast
• If spread is large the EM may be a very weak pattern and may not
represent any of the possible evolutions (use measure of ens. spread!)
ECMWF Training Course Reading, 25 April 2006
Deterministic vs. Probabilistic use of EPS
Use ensemble mean only or explicit use of whole PDF
5 10 15 20 25 5 10 15 20 25
Reliability <-> BiasResolution <-> ACCBrier Score <-> RMS
Probabilistic forecast verification has similarities to deterministic verification
ECMWF Training Course Reading, 25 April 2006
Why Probabilities?
• Open air restaurant scenario: open additional tables: £20 extra cost, £100 extra income (if T>25ºC) weather forecast: 30% probability for T>25ºC what would you do?
• Test the system for 100 days: 30 x T>25ºC -> 30 x (100 – 20) = 2400 70 x T<25ºC -> 70 x ( 0 – 20) = -1400 +1000
• Employing extra waiter (spending £20) is beneficial when probability for T>25 ºC is greater 20%• The higher/lower the cost loss ratio, the higher/lower probabilities are needed in order to benefit from action on forecast
ECMWF Training Course Reading, 25 April 2006
ReliabilityTake a sample of probabilistic forecasts: e.g. 30 days x 2200 GP = 66000 forecasts
How often was event (T > 25) forecasted with X probability?
FC Prob.
# FC “perfect FC”OBS-Freq.
“real” OBS-Freq.
100% 8000 8000 (100%)
7200 (90%)
90% 5000 4500 ( 90%)
4000 (80%)
80% 4500 3600 ( 80%)
3000 (66%)
…. …. …. ….
…. …. …. ….
…. …. …. ….
10% 5500 550 ( 10%) 800 (15%)
0% 7000 0 ( 0%) 700 (10%)
25
25
25
ECMWF Training Course Reading, 25 April 2006
Reliability
FC Prob.
# FC “perfect FC”OBS-Freq.
“real” OBS-Freq.
100% 8000 8000 (100%)
7200 (90%)
90% 5000 4500 ( 90%)
4000 (80%)
80% 4500 3600 ( 80%)
3000 (66%)
…. …. …. ….
…. …. …. ….
…. …. …. ….
10% 5500 550 ( 10%) 800 (15%)
0% 7000 0 ( 0%) 700 (10%)
FC-Probability
OB
S-F
req
uen
cy
00 100
100
Take a sample of probabilistic forecasts: e.g. 30 days x 2200 GP = 66000 forecasts
How often was event (T > 25) forecasted with X probability?
••
•
••
ECMWF Training Course Reading, 25 April 2006
Reliability Diagram
over-confident model perfect model
ECMWF Training Course Reading, 25 April 2006
Reliability Diagram
under-confident model perfect model
ECMWF Training Course Reading, 25 April 2006
Reliability diagramReliability score (the smaller, the
better)
imperfect model perfect model
ECMWF Training Course Reading, 25 April 2006
Components of the Brier Score
2
1
)(1
ii
I
ii ofn
NREL
N = total number of cases
I = number of probability bins
ni = number of cases in probability bin i
fi = forecast probability in probability bin i
oi = frequency of event being observed when forecasted with fi
Reliability: forecast probability vs. observed relative frequencies
ECMWF Training Course Reading, 25 April 2006
Reliability diagram
Poor resolution Good resolution
Reliability score (the smaller, the better)Resolution score (the bigger, the better)
c c
ECMWF Training Course Reading, 25 April 2006
Components of the Brier Score
2
1
)(1
ii
I
ii ofn
NREL
N = total number of cases
I = number of probability bins
ni = number of cases in probability bin i
fi = forecast probability in probability bin I
oi = frequency of event being observed when forecasted with fi
c = frequency of event being observed in whole sample
Reliability: forecast probability vs. observed relative frequencies Resolution: ability to issue reliable forecasts close to 0% or 100%
2
1
)(1
conN
RES i
I
ii
Uncertainty: variance of observations frequency in sample
)1( ccUNC
ECMWF Training Course Reading, 25 April 2006
Brier Score
• The Brier score is a measure of the accuracy of probability forecasts
N
nnnN
BS op1
2
)(1
• p is forecast probability (fraction of members predicting event)
• o is observed outcome (1 if event occurs; 0 if event does not occur)
• BS varies from 0 (perfect deterministic forecasts) to 1 (perfectly wrong!)
cBS
BSBSS 1
• Brier skill score (BSS) is a measure for skill relative to climatology (p=frequency of the event in the climate sample)
• positive (negative) BSS better (worse) than reference
Brier Score = Reliability – Resolution + Uncertainty
ECMWF Training Course Reading, 25 April 2006
Reliability: 2m-Temp.>0
0.0390.8990.141
BSSRel-ScRes-Sc
0.0390.8990.140
0.0950.9260.169
-0.001 0.877 0.123
0.0650.9180.147
-0.064 0.838 0.099
0.0470.8930.153
0.2040.9900.213
1 month lead, start date May, 1980 - 2001
CERFACS CNRM ECMWF INGV
LODYC MPI UKMO DEMETER
ECMWF Training Course Reading, 25 April 2006
Brier Skill Score
Europe: 850hPa Temperature, D+4
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1994F
1995A J A O D F
1996A J A O D F
1997A J A O D F
1998A J A O D F
1999A J A O D F
2000A J A O D F
2001A J A O D F
2002A J A O D F
2003A J A O D F
2004A J A O D F
2005A J A O D
2006
Brier skill score (long term clim) fc step 96 T850 anomaly exceedingProbability forecast verification against an ( 3-M. moving sample)
-8 K -4 K 4 K 8 K
ECMWF Training Course Reading, 25 April 2006
Ranked Probability Score
• Measures the quadratic distance between forecast and verification
probabilities for several categories
K
kkBS
KRPS
11
1
• It is the average Brier score across the range of the variable
• Ranked Probability Skill Score (RPSS) is a measure for skill relative to a reference forecast
cRPS
RPSRPSS 1
• negative / positive RPSS worse / better than reference
ECMWF Training Course Reading, 25 April 2006
Brier Score -> Ranked Probability Score
5 10 15 20 25
1
• Brier Score used for two category (yes/no) situations (e.g. T > 15oC)
5 10 15 20 25
1
• RPS takes into account ordered nature of variable (“extreme errors”)
ECMWF Training Course Reading, 25 April 2006
Ranked Probability Skill Score
Northern Hemisphere: 500hPa Geopotential
ECMWF Training Course Reading, 25 April 2006
Verification of two category (yes/no) situation
• Compute 2 x 2 contingency table: (for a set of cases)
• Event Probability: s = (a+c) / n
• Probability of a Forecast of occurrence: r = (a+b) / n
• Frequency Bias: B = (a+b) / (a+c)
• Proportion Correct: PC = (a+d) / n
Event observed
Yes No total
Event forecaste
d
Yes a b a+b
No c d c+d
total
a+c b+da+b+c+d=n
ECMWF Training Course Reading, 25 April 2006
Example of Finley Tornado Forecasts (1884)
• Compute 2 x 2 contingency table: (for a set of cases)
Event observed
Yes No total
Event forecaste
d
Yes 28 72 100
No 23 2680 2703
total
51 2752 2803
• Event Probability: s = (a+c) / n = 51/2803
= 0.018
• Probability of a Forecast of occurrence: r = (a+b) / n = 100/2803 = 0.036
• Frequency Bias: B = (a+b) / (a+c) = 100/51 = 1.961
• Proportion Correct: PC = (a+d) / n = 2708/2803 = 0.966
96.6% Accuracy
ECMWF Training Course Reading, 25 April 2006
Example of Finley Tornado Forecasts (1884)
• Compute 2 x 2 contingency table: (for a set of cases)
• Event Probability: s = (a+c) / n = 51/2803 = 0.018
• Probability of a Forecast of occurrence: r = (a+b) / n = 0/2803 = 0.0
• Frequency Bias: B = (a+b) / (a+c) = 0/51 = 0.0
• Proportion Correct: PC = (a+d) / n = 2752/2803 = 0.982
Event observed
Yes No total
Event forecaste
d
Yes 0 0 0
No 51 2752 2803
total
51 2752 2803
98.2% Accuracy!
ECMWF Training Course Reading, 25 April 2006
Some Scores and Skill Scores
Score Formula Finley (original)
Finley(never fc T.)
Finley (always fc. T.)
Proportion Correct
PC=(a+d)/n 0.966 0.982 0.018
Threat Score
TS=a/(a+b+c) 0.228 0.000 0.018
Odds Ratio Θ=(ad)/(bc) 45.3 - -
Odss Ratio Skill Score
Q=(ad-bc)/(ad+bc) 0.957 - -
Heidke Skill Score
HSS=2(ad-bc)/(a+c)(c+d)+(a+b)(b+d)
0.355 0.0 0.0
Peirce Skill Score
PSS=(ad-bc)/(a+c)(b+d)
0.523 0.0 0.0
Clayton Skill Score
CSS=(ad-bc)/(a+b)(c+d)
0.271 - -
Gilbert Skill Score (ETS)
GSS=(a-aref)/(a-aref+b+c)
aref = (a+b)(a+c)/n
0.216 0.0 0.0
ECMWF Training Course Reading, 25 April 2006
Definition of a proper score
• Consistency is one of the characteristics of a good forecast
• Some scoring rules encourage forecasters to be inconsistent, e.g. some scores give better results when a forecast closer to climatology is issued rather than the actual forecast (e.g. reliability)
• Scoring rule is strictly proper when the best scores are obtained if and only if the forecasts correspond with the forecaster’s judgement
• Examples of proper scores are the Brier Score or Ignorance Score
• n: forecast-verification pairs, i: quantiles
• Minimum only when pfc = pver -> proper score
• The lower/higher the IGN the better/worse the forecast system
Ignorance Score: IGN = - 1/n Σn Σi pn,i,ver ln pn,i,,fc
See Roulston & Smith, 2001
ECMWF Training Course Reading, 25 April 2006
Verification of two category (yes/no) situation
• Compute 2 x 2 contingency table: (for a set of cases)
• Event Probability: s = (a+c) / n
• Probability of a Forecast of occurrence: r = (a+b) / n
• Frequency Bias: B = (a+b) / (a+c)
• Hit Rate: H = a / (a+c)
• False Alarm Rate: F = b / (b+d)
• False Alarm Ratio: FAR = b / (a+b)
Event observed
Yes No total
Event forecaste
d
Yes a b a+b
No c d c+d
total
a+c b+da+b+c+d=n
ECMWF Training Course Reading, 25 April 2006
Example of Finley Tornado Forecasts (1884)
• Compute 2 x 2 contingency table: (for a set of cases)
Event observed
Yes No total
Event forecaste
d
Yes 28 72 100
No 23 2680 2703
total
51 2752 2803
• Event Probability: s = (a+c) / n = 0.018
• Probability of a Forecast of occurrence: r = (a+b) / n = 0.036
• Frequency Bias: B = (a+b) / (a+c) = 1.961
• Hit Rate: H = a / (a+c) = 0.549
• False Alarm Rate: F = b / (b+d) = 0.026
• False Alarm Ratio: FAR = b / (a+b) = 0.720
ECMWF Training Course Reading, 25 April 2006
Event observed
Yes No
Event forecasted
>80% - 100%
30 5
>60% - 80% 25 10
>40% - 60% 20 15
>20% - 40% 15 20
>0% - 20% 10 25
0% 5 30
total 105 105
threshold H F
>80% 30/105 5/105
>60% 55/105 15/105
>40% 75/105 30/105
>20% 90/105 50/105
>0%100/105
75/105
105/105
105/105
Extension of 2 x 2 contingency table for prob. FC
ECMWF Training Course Reading, 25 April 2006
Extension of 2 x 2 contingency table for prob. FC
Event observed
Yes No threshold H F
Event forecasted
>80% - 100%
30 5 >80% 0.29 0.05
>60% - 80% 25 10 >60% 0.52 0.14
>40% - 60% 20 15 >40% 0.71 0.29
>20% - 40% 15 20 >20% 0.86 0.48
>0% - 20% 10 25 >0% 0.95 0.71
0% 5 30 1.00 1.00
total 105 105
False Alarm Rate
Hit
Rate
00 1
1 ••••
••
ECMWF Training Course Reading, 25 April 2006
ROC curve• ROC curve is plot of H against F for range of probability thresholds
low threshold
moderate threshold
high threshold
• ROC area (area under the ROC curve) is skill measure A=0.5 (no skill), A=1 (perfect deterministic forecast)
A=0.83
H
F
ECMWF Training Course Reading, 25 April 2006
ROC area
ECMWF Training Course Reading, 25 April 2006
ROCA vs. RPSS vs. BSS
ECMWF Training Course Reading, 25 April 2006
ROCSS vs. BSS
cBS
BSBSS 1
• ROCSS or BSS > 0 indicate skilful forecast system
12 AROCSS
Northern Extra-Tropics 500 hPa anomalies > 2σ (spring 2002)
Richardson, 2005
ROC skill score Brier skill score
ECMWF Training Course Reading, 25 April 2006
Benefits for different users - decision making• A user (or “decision maker”) is sensitive to a specific weather event• The user has a choice of two actions: do nothing and risk a potential loss L if weather event occurs take preventative action at a cost C to protect against loss L
• no forecast information: either always take action or never take action• deterministic forecast: act when adverse weather predicted
• probability forecast: act when probability of specific event exceeds a certain threshold. This threshold depends on the user• Value V of a forecast savings made by using forecast normalised so that V=1 for perfect forecast, V=0 for forecast no better than climatology• simplest possible case - but shows many important features (see also Richardson, 2000)
ECMWF Training Course Reading, 25 April 2006
Decision making: the cost-loss model
),min( LoCEC • Climate information – expense:
cLbCaCEF • Always use forecast – expense:
CoEP • Perfect forecast – expense:
PC
FC
EE
EEV
forecastperfect from saving
forecast using from saving• Value:
Event occurs
Yes No
Action taken
Yes C C
No L 0
Event occurs
Yes No
Event forecast
Yes a b
No c d
o 1-o
Potential costsFraction ofoccurences
ECMWF Training Course Reading, 25 April 2006
Decision making: the cost-loss model
PC
FC
EE
EEV
forecastperfect from saving
forecast using from saving
CC
C
o-L)o,min(
cL) bC (aC - L)o,min(
o-)o,min(
o)-(1o)o-F(1 - )o,min(
Hwith: α = C/L H = a/(a+c) F = b/(b+d) o = a+c
Northern Extra-Tropics (winter 01/02)D+5 deterministic FC > 1mm precip• For given weather event
and FC system: o, H and F are fixed
• value depends on C/L
• max if: C/L = o
• Vmax = H-F
ECMWF Training Course Reading, 25 April 2006
Potential economic value
Northern Extra-Tropics (winter 01/02) D+5 FC > 1mm precipitation
deterministic EPS
p = 0.2 p = 0.5 p = 0.8
ECMWF Training Course Reading, 25 April 2006
Potential economic valueNorthern Extra-Tropics (winter 01/02) FC > 1mm precipitation
EPS: each user chooses the most appropriate probability threshold
Control
Results based on simple cost/loss models have indicated that
EPS probabilistic forecasts have a higher value than single deterministic forecasts
EPS
ECMWF Training Course Reading, 25 April 2006
Potential economic valueNorthern Extra-Tropics (winter 01/02) D+5 FC > 20mm precipitation
• BSS = 0.06 (measure of overall value for all possible users)
• ROCSS = 0.65 (closely linked to Vmax)
ECMWF Training Course Reading, 25 April 2006
Summary
• Different ways of incorporating added dimension of EPS (EM vs. PDF)
• Ensemble mean is best deterministic forecast EM should be used together with measure of spread
• Verification of probability forecast different scores measure different aspects of forecast performance Reliability / Resolution, Brier Score (BSS), RPS (RPSS), ROC,… Perception of usefulness of ensemble may vary with score used It is important to understand the behaviour of different scores and choose appropriately• Potential economic value Decision making is user dependent Cost-Loss model a simple illustration – but shows many useful features
ECMWF Training Course Reading, 25 April 2006
References and further reading
• Katz, R. W. and A.H. Murphy, 1997: Economic value of weather and climate forecasting. Cambridge University Press, pp. 222.
• Roulston, M. S. and L.A. Smith, 2001: Evaluating Probabilistic Forecasts Using Information Theory. Monthly Weather Review, 130, 1653-1660.
• Palmer, T.N. and R. Hagedorn (editors), 2006: Predictability of weather and climate. Cambridge University Press (available from July 2006)
• Jolliffe, I.T. and D.B. Stephenson, 2003: Forecast Verification. A Practitioner’s Guide in Atmospheric Science. Wiley, pp. 240
• Wilks, D. S., 2006: Statistical methods in the atmospheric sciences. 2nd ed. Academic Press, pp.627
• ECMWF newsletter for updates on EPS performance
• Richardson, D. S., 2000. Skill and relative economic value of the ECMWF Ensemble Prediction System. Q. J. R. Meteorol. Soc., 126, 649-668.