verification of ensemble systems chiara marsigli arpa-simc

Verification of ensemble systemsChiara Marsigli

ARPA-SIMC

Deterministic forecastsEvent E

e.g.: the precipitation cumulated over 24 hours at a given location (raingauge, radar pixel, hydrological basin, area)

exceeds 20 mm

yeso(E) = 1

noo(E) = 0

the event is observed with frequency o(E)

the event is forecast with probability p(E)

yesp(E) = 1

nop(E) = 0

Probabilistic forecastsEvent E


exceeds 20 mm

yeso(E) = 1

noo(E) = 0


the event is forecast with probability p(E)

p(E) [0,1]

Ensemble forecastsEvent E


exceeds 20 mm

yeso(E) = 1

noo(E) = 0


M member ensemblethe event is forecast with probability p(E) =

k/Mno

memberp(E) = 0

all membersp(E) = 1

Probabilistic forecasts

An accurate probability forecast system has:

reliability - agreement between forecast probability and mean observed frequency

sharpness - tendency to forecast probabilities near 0 or 1, as opposed to values clustered around the mean

resolution - ability of the forecast to resolve the set of sample events into subsets with characteristically different outcomes

Scalar summary measure for the assessment of the forecast performance, RMS error of the probability forecast• n = number of points in the “domain” (spatio-temporal)• oik = 1 if the event occurs = 0 if the event does not occur • fk is the probability of occurrence according to the forecast system (e.g. the fraction of ensemble members forecasting the event)• BS can take on values in the range [0,1], a perfect forecast having BS = 0

n

kkk of

nBS

1

21

Brier Score

Sensitive to climatological frequency of the event: the more rare an event, the easier it is to get a good BS without having any real skill

M = ensemble sizeK = 0, …, M number of ensemble members forecasting the event (probability classes)N = total number of point in the verification domainNk = number of points where the event is forecast by k members

= frequency of the event in the sub-sample Nk

kN

iik oo

1

M

k

M

kkkkk oooo

NofN

NBS

0 0

22 )1()(1)(1

reliability

resolution

uncertainty

= total frequency of the event (sample climatology)

o

Brier Score decomposition Murphy

(1973)

The forecast system has predictive skill if BSS is positive, a perfect system having BSS = 1.

cli

cli

BSBSBSBSS

ooBScli 1

= total frequency of the event (sample climatology)

o

Brier Skill Score

Measures the improvement of the probabilistic forecast relative to a reference forecast (e. g. sample climatology)

Extension of the Brier Score to multi-event situation.The squared errors are computed with respect to the cumulative

probabilities in the forecast and observation vectors.

• M = number of forecast categories• oik = 1 if the event occurs in category k = 0 if the event does not occur in category k• fk is the probability of occurrence in category k according to the forecast system (e.g. the fraction of ensemble members forecasting the event)• RPS take on values in the range [0,1], a perfect forecast having RPS = 0

2

1 1111

M

m

m

kk

m

kk of

MRPS

Ranked Probability Score

contingency table

ObservedYes No

Forecast

Yes a bNo c d

A contingency table can be built for each probability class (a probability class can be defined as the % of ensemble elements which actually forecast a given event)

event theof soccurrence ofnumber totalevent theof forecastscorrect ofnumber

caaH

event theof soccurrence-non ofnumber totalevent theof forecastscorrect non ofnumber

dbbF

Hit Rate

False Alarm Rate

ROC Curves(Relative Operating Characteristics, Mason and Graham

1999)

For the k-th probability class:

Hit rates are plotted against the corresponding false alarm rates to generate the ROC Curve. The area under the ROC curve is used as a statistic measure of forecast usefulness. A value of 0.5 indicates that the forecast system has no skill.

M

kiik HH

M

kiik FF

ROC Curvek-th probability class: E is forecast if it is forecast by at least k ensemble members => a warning can be issued when the forecast probability for the predefined event exceeds some threshold

“At least 0 members” (always)

“At least M+1 members” (never)

xx x

xx

xx

x

x

xx

Cost-loss Analysis

With a deterministic forecast system, the mean expense for unit loss is:

oLCoHo

LCF

LCbaLc

11*)(*ME =

contingency table

ObservedYes No

Forecast

Yes a bNo c d

is the sample climatology (the observed frequency)

cao

Vk = MEpMEcli

fMEMEcli k

Value

Gain obtained using the system instead of the climatological information, percentage with respect to the gain obtained using a perfect system

Decisional model

E happensyes

noU take action

yes C Cno L 0

If the forecast system is probabilistic, the user has to fix a probability threshold k.

When this threshold is exceeded, it take protective action.

Cost-loss Analysis

Curves of Vk as a function of C/L, a curve for each probability threshold. The area under the envelope of the curves is the cost-loss area.

Reliability Diagram

o(p) is plotted against p for some finite binning of width dp

In a perfectly reliable system o(p)=p and the graph is a straight line oriented at 45o to the axes

If the curve lies below the 45° line, the probabilities are overestimatedIf the curve lies above the 45° line, the probabilities are underestimated

SharpnessRefers to the spread of the probability distributions.

It is expressed as the capability of the system to forecast extreme values, or values close 0 or 1. The frequency of forecasts in each probability bin (shown in the histogram) shows the sharpness of the forecast.

Rank histogram (Talagrand Diagram)Rank histogram of the distribution of the

values forecast by an ensemble

range of forecast value

V1

V2

V3

V4

V5

Outliers below

the minimum

Outliers abovethe maximum

I II III IV

Percentage of Outliers: percentage of points where the observed value lies out of the range

of forecast values.

bibliography www.bom.gov.au/bmrc/wefor/staff/eee/verif/verif_web_page.html www.ecmwf.int Bougeault, P., 2003. WGNE recommendations on verification methods for numerical prediction of weather elements and severe weather events (CAS/JSC WGNE Report No. 18) Jolliffe, I.T. and D.B. Stephenson, 2003. Forecast Verification: A Practitioner’s Guide. In Atmospheric Sciences (Wiley). Pertti Nurmi, 2003. Recommendations on the verification of local weather forecasts. ECMWF Technical Memorandum n. 430. Stanski, H.R., L.J. Wilson and W.R. Burrows, 1989. Survey of Common Verification Methods in Meteorology (WMO Research Report No. 89-5) Wilks D. S., 1995. Statistical methods in atmospheric sciences. Academic Press, New York, 467 pp.

bibliography Hamill, T.M., 1999: Hypothesis tests for evaluating numerical precipitation forecasts. Wea. Forecasting, 14, 155-167. Mason S.J. and Graham N.E., 1999. “Conditional probabilities, relative operating characteristics and relative operating levels”. Wea. and Forecasting, 14, 713-725. Murphy A.H., 1973. A new vector partition of the probability score. J. Appl. Meteor., 12, 595-600. Richardson D.S., 2000. “Skill and relative economic value of the ECMWF ensemble prediction system”. Quart. J. Roy. Meteor. Soc., 126, 649-667. Talagrand, O., R. Vautard and B. Strauss, 1997. Evaluation of probabilistic prediction systems. Proceedings, ECMWF Workshop on Predictability.

verification of ensemble systems chiara marsigli arpa-simc

Documents