Download - ECMWF Training Course Reading, 25 April 2006 EPS Diagnostic Tools Renate Hagedorn European Centre for Medium-Range Weather Forecasts

ECMWF Training Course Reading, 25 April 2006

EPS Diagnostic Tools

Renate Hagedorn European Centre for Medium-Range Weather Forecasts


Objective of diagnostic/verification tools

Assess quality of forecast systemi.e. determine skill and value of forecast

A forecast has skill if it predicts the observed conditions well according to some objective or subjective criteria.

A forecast has value if it helps the user to make better decisions than without knowledge of the forecast.

• Forecasts with poor skill can be valuable (e.g. location mismatch)

• Forecasts with high skill can be of little value (e.g. blue sky desert)


Ensemble Prediction System

• 1 control run + 50 perturbed runs (TL399 L62)

added dimension of ensemble members

f(x,y,z,t,e)

• How do we deal with added dimension when

interpreting, verifying and diagnosing EPS output?


Individual members (“stamp maps”)


EPSgrams

median

min

25%

75%

max Cloud Cover

Precipitation

10m wind

2m Temperature


Ensemble mean

Day+6 Ensemble mean Day+6 control

• It gives a smoother field than the deterministic forecasts, but the same result can’t be achieved with a simple filtering of a deterministic forecast

• The ensemble mean forecast is the average over all ensemble members


Ensemble mean

Day+6 Ensemble mean Day+6 control (filtered)

• It gives a smoother field than the deterministic forecasts, but the same result can’t be achieved with a simple filtering of a deterministic forecast

• If spread is large the EM may be a very weak pattern and may not

represent any of the possible evolutions (use measure of ens. spread!)


Deterministic vs. Probabilistic use of EPS

Use ensemble mean only or explicit use of whole PDF

5 10 15 20 25 5 10 15 20 25

Reliability <-> BiasResolution <-> ACCBrier Score <-> RMS

Probabilistic forecast verification has similarities to deterministic verification


Why Probabilities?

• Open air restaurant scenario: open additional tables: £20 extra cost, £100 extra income (if T>25ºC) weather forecast: 30% probability for T>25ºC what would you do?

• Test the system for 100 days: 30 x T>25ºC -> 30 x (100 – 20) = 2400 70 x T<25ºC -> 70 x ( 0 – 20) = -1400 +1000

• Employing extra waiter (spending £20) is beneficial when probability for T>25 ºC is greater 20%• The higher/lower the cost loss ratio, the higher/lower probabilities are needed in order to benefit from action on forecast


ReliabilityTake a sample of probabilistic forecasts: e.g. 30 days x 2200 GP = 66000 forecasts

How often was event (T > 25) forecasted with X probability?

FC Prob.

# FC “perfect FC”OBS-Freq.

“real” OBS-Freq.

100% 8000 8000 (100%)

7200 (90%)

90% 5000 4500 ( 90%)

4000 (80%)

80% 4500 3600 ( 80%)

3000 (66%)

…. …. …. ….

…. …. …. ….

…. …. …. ….

10% 5500 550 ( 10%) 800 (15%)

0% 7000 0 ( 0%) 700 (10%)

25

25

25


Reliability

FC Prob.

# FC “perfect FC”OBS-Freq.

“real” OBS-Freq.

100% 8000 8000 (100%)

7200 (90%)

90% 5000 4500 ( 90%)

4000 (80%)

80% 4500 3600 ( 80%)

3000 (66%)

…. …. …. ….

…. …. …. ….

…. …. …. ….

10% 5500 550 ( 10%) 800 (15%)

0% 7000 0 ( 0%) 700 (10%)

FC-Probability

OB

S-F

req

uen

cy

00 100

100

Take a sample of probabilistic forecasts: e.g. 30 days x 2200 GP = 66000 forecasts

How often was event (T > 25) forecasted with X probability?

••

•

••


Reliability Diagram

over-confident model perfect model


Reliability Diagram

under-confident model perfect model


Reliability diagramReliability score (the smaller, the

better)

imperfect model perfect model


Components of the Brier Score

2

1

)(1

ii

I

ii ofn

NREL

N = total number of cases

I = number of probability bins

ni = number of cases in probability bin i

fi = forecast probability in probability bin i

oi = frequency of event being observed when forecasted with fi

Reliability: forecast probability vs. observed relative frequencies


Reliability diagram

Poor resolution Good resolution

Reliability score (the smaller, the better)Resolution score (the bigger, the better)

c c


Components of the Brier Score

2

1

)(1

ii

I

ii ofn

NREL

N = total number of cases

I = number of probability bins

ni = number of cases in probability bin i

fi = forecast probability in probability bin I

oi = frequency of event being observed when forecasted with fi

c = frequency of event being observed in whole sample

Reliability: forecast probability vs. observed relative frequencies Resolution: ability to issue reliable forecasts close to 0% or 100%

2

1

)(1

conN

RES i

I

ii

Uncertainty: variance of observations frequency in sample

)1( ccUNC


Brier Score

• The Brier score is a measure of the accuracy of probability forecasts

N

nnnN

BS op1

2

)(1

• p is forecast probability (fraction of members predicting event)

• o is observed outcome (1 if event occurs; 0 if event does not occur)

• BS varies from 0 (perfect deterministic forecasts) to 1 (perfectly wrong!)

cBS

BSBSS 1

• Brier skill score (BSS) is a measure for skill relative to climatology (p=frequency of the event in the climate sample)

• positive (negative) BSS better (worse) than reference

Brier Score = Reliability – Resolution + Uncertainty


Reliability: 2m-Temp.>0

0.0390.8990.141

BSSRel-ScRes-Sc

0.0390.8990.140

0.0950.9260.169

-0.001 0.877 0.123

0.0650.9180.147

-0.064 0.838 0.099

0.0470.8930.153

0.2040.9900.213

1 month lead, start date May, 1980 - 2001

CERFACS CNRM ECMWF INGV

LODYC MPI UKMO DEMETER


Brier Skill Score

Europe: 850hPa Temperature, D+4

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1994F

1995A J A O D F

1996A J A O D F

1997A J A O D F

1998A J A O D F

1999A J A O D F

2000A J A O D F

2001A J A O D F

2002A J A O D F

2003A J A O D F

2004A J A O D F

2005A J A O D

2006

Brier skill score (long term clim) fc step 96 T850 anomaly exceedingProbability forecast verification against an ( 3-M. moving sample)

-8 K -4 K 4 K 8 K


Ranked Probability Score

• Measures the quadratic distance between forecast and verification

probabilities for several categories

K

kkBS

KRPS

11

1

• It is the average Brier score across the range of the variable

• Ranked Probability Skill Score (RPSS) is a measure for skill relative to a reference forecast

cRPS

RPSRPSS 1

• negative / positive RPSS worse / better than reference


Brier Score -> Ranked Probability Score

5 10 15 20 25

1

• Brier Score used for two category (yes/no) situations (e.g. T > 15oC)

5 10 15 20 25

1

• RPS takes into account ordered nature of variable (“extreme errors”)


Ranked Probability Skill Score

Northern Hemisphere: 500hPa Geopotential


Verification of two category (yes/no) situation

• Compute 2 x 2 contingency table: (for a set of cases)

• Event Probability: s = (a+c) / n

• Probability of a Forecast of occurrence: r = (a+b) / n

• Frequency Bias: B = (a+b) / (a+c)

• Proportion Correct: PC = (a+d) / n

Event observed

Yes No total

Event forecaste

d

Yes a b a+b

No c d c+d

total

a+c b+da+b+c+d=n


Example of Finley Tornado Forecasts (1884)


Event observed

Yes No total

Event forecaste

d

Yes 28 72 100

No 23 2680 2703

total

51 2752 2803

• Event Probability: s = (a+c) / n = 51/2803

= 0.018

• Probability of a Forecast of occurrence: r = (a+b) / n = 100/2803 = 0.036

• Frequency Bias: B = (a+b) / (a+c) = 100/51 = 1.961

• Proportion Correct: PC = (a+d) / n = 2708/2803 = 0.966

96.6% Accuracy




• Event Probability: s = (a+c) / n = 51/2803 = 0.018

• Probability of a Forecast of occurrence: r = (a+b) / n = 0/2803 = 0.0

• Frequency Bias: B = (a+b) / (a+c) = 0/51 = 0.0

• Proportion Correct: PC = (a+d) / n = 2752/2803 = 0.982

Event observed

Yes No total

Event forecaste

d

Yes 0 0 0

No 51 2752 2803

total

51 2752 2803

98.2% Accuracy!


Some Scores and Skill Scores

Score Formula Finley (original)

Finley(never fc T.)

Finley (always fc. T.)

Proportion Correct

PC=(a+d)/n 0.966 0.982 0.018

Threat Score

TS=a/(a+b+c) 0.228 0.000 0.018

Odds Ratio Θ=(ad)/(bc) 45.3 - -

Odss Ratio Skill Score

Q=(ad-bc)/(ad+bc) 0.957 - -

Heidke Skill Score

HSS=2(ad-bc)/(a+c)(c+d)+(a+b)(b+d)

0.355 0.0 0.0

Peirce Skill Score

PSS=(ad-bc)/(a+c)(b+d)

0.523 0.0 0.0

Clayton Skill Score

CSS=(ad-bc)/(a+b)(c+d)

0.271 - -

Gilbert Skill Score (ETS)

GSS=(a-aref)/(a-aref+b+c)

aref = (a+b)(a+c)/n

0.216 0.0 0.0


Definition of a proper score

• Consistency is one of the characteristics of a good forecast

• Some scoring rules encourage forecasters to be inconsistent, e.g. some scores give better results when a forecast closer to climatology is issued rather than the actual forecast (e.g. reliability)

• Scoring rule is strictly proper when the best scores are obtained if and only if the forecasts correspond with the forecaster’s judgement

• Examples of proper scores are the Brier Score or Ignorance Score

• n: forecast-verification pairs, i: quantiles

• Minimum only when pfc = pver -> proper score

• The lower/higher the IGN the better/worse the forecast system

Ignorance Score: IGN = - 1/n Σn Σi pn,i,ver ln pn,i,,fc

See Roulston & Smith, 2001


Verification of two category (yes/no) situation


• Event Probability: s = (a+c) / n

• Probability of a Forecast of occurrence: r = (a+b) / n

• Frequency Bias: B = (a+b) / (a+c)

• Hit Rate: H = a / (a+c)

• False Alarm Rate: F = b / (b+d)

• False Alarm Ratio: FAR = b / (a+b)

Event observed

Yes No total

Event forecaste

d

Yes a b a+b

No c d c+d

total

a+c b+da+b+c+d=n




Event observed

Yes No total

Event forecaste

d

Yes 28 72 100

No 23 2680 2703

total

51 2752 2803

• Event Probability: s = (a+c) / n = 0.018

• Probability of a Forecast of occurrence: r = (a+b) / n = 0.036

• Frequency Bias: B = (a+b) / (a+c) = 1.961

• Hit Rate: H = a / (a+c) = 0.549

• False Alarm Rate: F = b / (b+d) = 0.026

• False Alarm Ratio: FAR = b / (a+b) = 0.720


Event observed

Yes No

Event forecasted

>80% - 100%

30 5

>60% - 80% 25 10

>40% - 60% 20 15

>20% - 40% 15 20

>0% - 20% 10 25

0% 5 30

total 105 105

threshold H F

>80% 30/105 5/105

>60% 55/105 15/105

>40% 75/105 30/105

>20% 90/105 50/105

>0%100/105

75/105

105/105

105/105

Extension of 2 x 2 contingency table for prob. FC


Extension of 2 x 2 contingency table for prob. FC

Event observed

Yes No threshold H F

Event forecasted

>80% - 100%

30 5 >80% 0.29 0.05

>60% - 80% 25 10 >60% 0.52 0.14

>40% - 60% 20 15 >40% 0.71 0.29

>20% - 40% 15 20 >20% 0.86 0.48

>0% - 20% 10 25 >0% 0.95 0.71

0% 5 30 1.00 1.00

total 105 105

False Alarm Rate

Hit

Rate

00 1

1 ••••

••


ROC curve• ROC curve is plot of H against F for range of probability thresholds

low threshold

moderate threshold

high threshold

• ROC area (area under the ROC curve) is skill measure A=0.5 (no skill), A=1 (perfect deterministic forecast)

A=0.83

H

F


ROC area


ROCA vs. RPSS vs. BSS


ROCSS vs. BSS

cBS

BSBSS 1

• ROCSS or BSS > 0 indicate skilful forecast system

12 AROCSS

Northern Extra-Tropics 500 hPa anomalies > 2σ (spring 2002)

Richardson, 2005

ROC skill score Brier skill score


Benefits for different users - decision making• A user (or “decision maker”) is sensitive to a specific weather event• The user has a choice of two actions: do nothing and risk a potential loss L if weather event occurs take preventative action at a cost C to protect against loss L

• no forecast information: either always take action or never take action• deterministic forecast: act when adverse weather predicted

• probability forecast: act when probability of specific event exceeds a certain threshold. This threshold depends on the user• Value V of a forecast savings made by using forecast normalised so that V=1 for perfect forecast, V=0 for forecast no better than climatology• simplest possible case - but shows many important features (see also Richardson, 2000)


Decision making: the cost-loss model

),min( LoCEC • Climate information – expense:

cLbCaCEF • Always use forecast – expense:

CoEP • Perfect forecast – expense:

PC

FC

EE

EEV

forecastperfect from saving

forecast using from saving• Value:

Event occurs

Yes No

Action taken

Yes C C

No L 0

Event occurs

Yes No

Event forecast

Yes a b

No c d

o 1-o

Potential costsFraction ofoccurences


Decision making: the cost-loss model

PC

FC

EE

EEV

forecastperfect from saving

forecast using from saving

CC

C

o-L)o,min(

cL) bC (aC - L)o,min(

o-)o,min(

o)-(1o)o-F(1 - )o,min(

Hwith: α = C/L H = a/(a+c) F = b/(b+d) o = a+c

Northern Extra-Tropics (winter 01/02)D+5 deterministic FC > 1mm precip• For given weather event

and FC system: o, H and F are fixed

• value depends on C/L

• max if: C/L = o

• Vmax = H-F


Potential economic value

Northern Extra-Tropics (winter 01/02) D+5 FC > 1mm precipitation

deterministic EPS

p = 0.2 p = 0.5 p = 0.8


Potential economic valueNorthern Extra-Tropics (winter 01/02) FC > 1mm precipitation

EPS: each user chooses the most appropriate probability threshold

Control

Results based on simple cost/loss models have indicated that

EPS probabilistic forecasts have a higher value than single deterministic forecasts

EPS


Potential economic valueNorthern Extra-Tropics (winter 01/02) D+5 FC > 20mm precipitation

• BSS = 0.06 (measure of overall value for all possible users)

• ROCSS = 0.65 (closely linked to Vmax)


Summary

• Different ways of incorporating added dimension of EPS (EM vs. PDF)

• Ensemble mean is best deterministic forecast EM should be used together with measure of spread

• Verification of probability forecast different scores measure different aspects of forecast performance Reliability / Resolution, Brier Score (BSS), RPS (RPSS), ROC,… Perception of usefulness of ensemble may vary with score used It is important to understand the behaviour of different scores and choose appropriately• Potential economic value Decision making is user dependent Cost-Loss model a simple illustration – but shows many useful features


References and further reading

• Katz, R. W. and A.H. Murphy, 1997: Economic value of weather and climate forecasting. Cambridge University Press, pp. 222.

• Roulston, M. S. and L.A. Smith, 2001: Evaluating Probabilistic Forecasts Using Information Theory. Monthly Weather Review, 130, 1653-1660.

• Palmer, T.N. and R. Hagedorn (editors), 2006: Predictability of weather and climate. Cambridge University Press (available from July 2006)

• Jolliffe, I.T. and D.B. Stephenson, 2003: Forecast Verification. A Practitioner’s Guide in Atmospheric Science. Wiley, pp. 240

• Wilks, D. S., 2006: Statistical methods in the atmospheric sciences. 2nd ed. Academic Press, pp.627

• ECMWF newsletter for updates on EPS performance

• Richardson, D. S., 2000. Skill and relative economic value of the ECMWF Ensemble Prediction System. Q. J. R. Meteorol. Soc., 126, 649-668.

Download - ECMWF Training Course Reading, 25 April 2006 EPS Diagnostic Tools Renate Hagedorn European Centre for Medium-Range Weather Forecasts

Top Related